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Abstract 


Let A p = hr and B p = xx be two independent random matrices where X = (A'y) px „ 
and Y = (Yij) pxm respectively consist of real (or complex) independent random variables with 
E Xij = E Yij = 0, E|A,j| 2 = E|Y)j| 2 = 1. Denote by Ai the largest root of the determinantal 
equation det(AA p — B p ) = 0. We establish the Tracy-Widom type universality for Ai under 
some moment conditions on X l:] and Y l3 when p/m and pjn approach positive constants as 


ij )pxm 


ij JpXn 


p —> OO. 


KEYWORDS: Tracy-Widom distribution, largest eigenvalue, sample covariance matrix, F 
matrix. 

1 Introduction 

High-dimensional data now commonly arise in many scientific fields such as genomics, image pro¬ 
cessing, microarray, proteomics and finance, to name but a few. It is well-known that the classical 
theory of multivariate statistical analysis for the fixed dimension p and large sample size n may 
lose its validity when handling high-dimensional data. A popular tool in analyzing large covari¬ 
ance matrices and hence high-dimensional data is random matrix theory. The spectral analysis of 
high-dimensional sample covariance matrices has attracted considerable interests among statisti¬ 
cians, probabilitists and mathematicians since the seminal work of Marcenko and Pastur [17] about 
the limiting spectral distribution for a class of sample covariance matrices. One can refer to the 
monograph of Bai and Silverstein [1] for a comprehensive summary and references therein. 

The largest eigenvalue of covariance matrices plays an important role in multivariate statistical 
analysis such as principle component analysis (PCA), multivariate analysis of variance (MANOVA) 
and discriminant analysis. One may refer to [18] for more details. In this paper we focus on the 
largest eigenvalue of the F type matrices. Suppose that 



YY* 


XX* 


( 1 . 1 ) 


m 



n 
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are two independent random matrices where X = ( Xij) pxn and Y = (Yij) pxm respectively consist 
of real (or complex) independent random variables with E Xij = E Yij = 0 and E| X^j \ 2 = E|l^j| 2 = 1. 
Consider the determinantal equation 

det(AA p - B p ) = 0. (1.2) 

When Ap is invertible, the roots to (1.2) are the eigenvalues of a F matrix 

Ap 1 Bp, (1.3) 

referred to as a Fisher matrix in the literature. The determinantal equation (1.2) is closely connected 
with the generalized eigenproblem 


det[A(Ap + Bp) — Bp] — 0. (1.4) 

We illustrate this in the next section. Many classical multivariate statistical tests are based on 
the roots of (1.2) or (1.4). For instance, one may use them to test the equality of two covariance 
matrices and the general linear hypothesis. In the framework of multivariate analysis of variance 
(MANOVA), Ap represents the within group covariance matrix while B p means the between groups 
covariance matrix. A one-way MANOVA can be used to examine the hypothesis of equality of the 
mean vectors of interest. 

Tracy and Widom in [24, 25] first discovered the limiting distributions of the largest eigenvalue 
for the large Gaussian Wigner ensemble, thus named as Tracy-Widom’s law. Since their pioneer 
work study toward the largest eigenvalues of large random matrices becomes flourishing. To name 
a few we mention [11], [12], [6], [10] and [21]. Among them we would mention El Karoui [6] which 
handled the largest eigenvalue of Wishart matrices for the nonnull population covariance matrix 
and provided a kind of condition on the population covariance matrix to ensure the Tracy-Widow 
law (see (4.41) below). 

A follow-up to the above results is to establish the so-called universality property for generally 
distributed large random matrices. Specifically speaking, the universality property states that 
the limiting behavior of an eigenvalue statistic usually is not dependent on the distribution of the 
matrix entries. Indeed, the Tracy-Widom law has been established for the general sample covariance 
matrices under very general assumptions on the distributions of the entries of X. The readers can 
refer to [22], [23], [8], [9], [19], [27], [3], [16], [15] for some representative developments on this 
topic. When proving universality an important tool is the Lindeberg comparison strategy (see Tao 
and Vu in [22] and Erdos, Yau and Yin [8]) and an important input when applying Lindeberg’s 
comparison strategy is the strong local law developed by Erdos, Schlein and Yau in [7] and Erdos, 
Yau and Yin in [8]. 

Johnstone in [13] proved that the largest root of (1.1) converges to Tracy and Widom’s distri¬ 
bution of type one after appropriate centering and scaling when the dimension p of the matrices A p 
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and B„ is even, lim p/m < 1 and B p and A p are both Wishart matrices. It is believed that the 

^ p—>-oo y ^ 

limiting distribution should not be affected by the dimension p. Indeed, numerical investigations 
both in [13] and [14] suggest that the Tracy and Widom approximation in the odd dimension case 
works as well as in the even dimension case. Besides, as it can be guessed, the Tracy and Widom 
approximation should not rely on the Gaussian assumption. However, theoretical support for these 
remains open. Furthermore, when A p is not invertible the limiting distribution of the largest root 
to (1.1) is unknown yet even under the gaussian assumption. 

In this paper, we prove the universality of the largest root of (1.2) by imposing some moment 
conditions on A p and B p . Specifically speaking we prove that the largest root of (1.2) converges in 
distribution to the Tracy and Widom law for the general distributions of the entries of X and Y 
no matter what the dimension p is, even or odd. Moreover the result holds when lim p/m < 1 or 

p—>oo 

lim p/m > 1, corresponding to invertible A p and non-invertible A p . This result also implies the 

p—KX) ^ y 

asymptotic distribution of the largest root of (1.4). 

At this point it is also appropriate to mention some related work about the roots of (1.2). The 
limiting spectral distribution of the roots was derived by [26] and [1]. One may also find the limits 
of the largest root and the smallest root in [1]. Central limit theorem about linear spectral statistics 
was established in [29]. Very recently, the so-called spiked F model has been investigated by [5] 
and [28] . We would like to point out that they prove the local asymptotic normality or asymptotic 
normality for the largest eigenvalue of the spiked F model, which is completely different from our 
setting. 

We conclude this section by outlining some ideas in the proof and presenting the structure of 
the rest of the paper. When A p is invertible, the roots to (1.2) become those of the F matrix 

A' 1 B P so that we may work on A p 1 B p . Roughly speaking, A~ 1 B p can be viewed as a kind of 

general sample covariance matrix T(/ 2 XX*T ; 1 / 2 with T n being a population covariance matrix by 
conditioning on B p . Denote the largest root of (1.2) by Ai. The key idea is to break Ai into a sum 
of two parts as follows 

Ai Pp — (Ai (ip) T (/i p Pp)i (1-5) 

where (i p is an appropriate value when B p is given and p p is an appropriate value when B p is 

not given (their definitions are given in the later sections). However we can not condition on B p 
directly. Instead we first construct an appropriate event so that we can handle the first term on 
the right hand of (1.5) on the event to apply the earlier results about T n ' XX*T„ . Particularly 
we need to verify the condition (4.41) below. Once this is done, the next step is to prove that the 
second term on the right hand of (1.5) after scaling converges to zero in probability. This approach 
is different from that used in the literature in proving universality for the local eigenvalue statistics. 

Unfortunately, when A p is not invertible we can not work on F matrices A _ 1 B p anymore. 
To overcome the difficulty we instead start from the determinantal equation (1.2). It turns out 
that the largest root Ai can then be linked to the largest root of some F matrix when X consists 
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of Gaussian random variables. Therefore the result about F matrices A 1 B p is applicable. For 
general distributions we find that it is equivalent to working on such a “covariance-type” matrix 

D^5U;lX(I - X*U^(U2XX*U2)“ 1 U 2 X)X*Up"5. (1.6) 


The definitions of D and U j,j = 1,2 are given in the later section. This matrix is much more 
complicated than general sample covariance matrices. To deal with (1.6) we construct a 3 x 3 block 
linearization matrix 


H = H(X) 


( -zl 0 D _1 / 2 U 1 X \ 

0 0 u 2 x 

v X T UfD- 1 / 2 X T U^ -I ) 


(1.7) 


where z = E + ip is a complex number with a positive imaginary part. It turns out that the upper 
left block of the 3x3 block matrix H 1 is the Stieltjes transform of (1.6) by simple calculations. We 
next develop the strong local law around the right end support p p by using a type of Lindeberg’s 
comparison strategy raised in [15] and then use it to prove edge universality by adapting the 
approach used in [8] and [3]. 

The paper is organized as follows. Section 2 is to give the main results. A statistical application 
and Tracy-Widom approximation will be discussed in Section 3. Section 4 is devoted to proving the 
main result when A p is invertible. In section 5 we will show the equivalence between the asymptotic 
means and asymptotic variances respectively given by [13] and by this paper. Sections 6 and 7 will 
prove the main result when A p is not invertible. 


2 The main results 

Throughout the paper we make the following conditions. 

Condition 1. Assume that {Zij} are independent random variables with E Zij = 0,E|Zjj| 2 = 1. 
For all k G N, there is a constant Ck such that K\Zij\ k < Ck- In addition, if {Zij} are complex, 
then E Zf- = 0. 

We say that a random matrix Z = ( Z %3 ) satisfies Condition 1 if its entries {Zij} satisfy Condition 

1. 

Condition 2. Assume that random matrices X = (Xjj) pn and Y = (Y ij) P) m are independent. 

Condition 3. Set m = m(p) and n = n{p). Suppose that 

lim — = d\ > 0, lim — = d2 > 0, 0 < lim —-— < 1. 

p—xx) m p— x» n p—xx m + n 
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To present the main results uniformly we define m = ma x{m,p}, h = min{n, m + n — p} and 
p = min{m,p}. Moreover let 


• 2 f /0 x min{p, h} — 1/2 . 2 

sm ( 7 / 2 ) = -—- ---, Sin (VV 2 ) = 


m + h — 1 

, 2,7+ V\ 3 3 

PJ,p = tan (——), a Jp = 


16 


max{p, h} — 1/2 
m + n — 1 

1 


( 2 . 1 ) 

( 2 . 2 ) 


2 ,p (m + h — l ) 2 sin( 7 ) sin(' 0 ) sin 2 (7 + if) 

Formulas (2.2) can be found in [13] when d\ < 1. 

We below present alternative expressions of pj )P and aj )P . To this end, define a modified density 
of the Marchenko-Pastur law [17] (MP law) by 

Q P (x ) = ——g- J(b p -x)(x- a p )I(a p <x< b p ), (2.3) 

2ttxL- v 


where a p = (1 - y ^) 2 and b p = (1 + y ^) 2 . Let 71 > 72 > • • • > 7 P satisfy 

/■+°o j 

/ g p (x)dx = -, 

Jjj P 

with 70 = b p and 7 p = a p . Moreover suppose that c p G [0,a p ) satisfies the equation 

P + OO 


r +00 

/ (— J —) 2 e p (x)dx = 

J- oo X-Cp 


One may easily check the existence and uniqueness of c p . Define 

p+oc 

( 

L-p 10 J —oo O- 0,p 

and 


Pp = —(1 + - [ ( Cp )e P {x)dx) 

c P n J -00 x - c p 


1 Ip /' +00 / 

- 3 — “3 (1 H — / (“ 

°p C P n 


u p \3 


Qp(x)dx). 


(2.4) 


(2.5) 


( 2 . 6 ) 


(2.7) 


It turns out that (2.2) and (2.6)-(2.7) are equivalent subject to some scaling, which is verified in 
Section 5. 

We also need the following moment match condition. 


Definition 1 (moment matching). Let X 1 = {x\-)mxN and X° = (x^)mxN be two matrices 
satsfing Condition 1 . We say that X 1 matches X° to order q, if for the integers i,j,l and k satisfing 
1 < i < M, l<j<N,0<l,k and l + k < q, they have the relationship 


E 






= E 


C\ 


(SSxWQRx^)* + 0 (exp(—(logp)°)) 


( 2 . 8 ) 


where C is some positive constant bigger than one, IRx is the real part and Qx is the imaginary 
part of x. 
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Throughout the paper we use X° to stand for the random matrix consisting of independent 
Gaussian random variables with mean zero and variance one. 

Denote the type-i Tracy-Widom distribution by F), i=l, 2(see [25]). Set B p = and 

. We are now in a position to state the main results about F type matrices. 


A - XI 

~ rh 


Theorem 2.1. Suppose that the real random matrices X and Y satisfy Conditions 1-3. Moreover 
suppose that 0 < cfe < oo. Denote the largest root of det(AA p — B p ) = 0 by X\. 

(i) If 0 < d\ < 1, then 

Ay - //r_ 

(2.9) 


n_ \ _ 

lim P( < s ) = Fi(s). 

P^- oo <7j,p 


(ii) If d\ > 1 and X matches the standard X° to order 3, then (2.9) still holds. 

Remark 1. When X and Y are complex random matrices, Theorem 2.1 still holds but the Tracy- 
Widom distribution F\(s) should be replaced by 1 * 2 ( 5 ). 

1/0 < d\ < 1, then A p is invertible. In this case the largest eigenvalue X\ is that of F matrices 
Ap 1 B P . If ri, > 1, then A p is not invertible. 

Remark 2. Theorem 2.1 immediately implies the distribution of the largest root of det(A(B p + 
A p ) — Bp) = 0. In fact the largest root of det(A(B p + A p ) — B p ) = 0 is if Ai is the largest 
root of the F matrices B P A~ 1 in Theorem 2.1 when 0 < d\ < 1. 

When d\ > 1 the largest root of det ^A(B P + A p ) — B p j = 0 is one with multiplicity (p — m). 

We instead consider the (p — m + l)th largest root of det ^A(B P + A p ) — Bpj = 0. It turns out 

that the (p — rn + l)th largest root of det ^A(B P + A p ) — B p ^ = 0 is if X\ is the largest root 
of det{ AAp — Bp) = 0. 

Moreover note the equality 

(Bp + Ap) 1 Bp + (Bp + Ap) 1 A p = I. 

If Y matches X° to order 3, then the smallest positive root of det(A(B p + A p ) — B p ) = 0 also 
tends to type-1 Tracy-Widom distribution after appropriate centralizing and rescaling by Theorem 
2.1 when d\ > 1 and c ?2 > 1- 

We would like to point out that Johnstone [13] proved part (i) of Theorem (2.1) when p is even, 
Ap and B p are both Wishart matrices. Part (ii) of Theorem (2.1) is new even if A p and B p are 
both Wishart matrices. When proving Theorem 2.1 we have indeed obtained different asymptotic 
mean and variance. Precisely we have proved that 


lim P(<Tph 2//3 (Ai — p p ) < s) = F\(s ) 


p — Yog 


and that 


, m 


- Tp\ = 0{p X ), lim Vp^Fjr.aj.p = 1. 


m 


n 


p—>• 00 x 77,1/3 


( 2 . 10 ) 

( 2 . 11 ) 


(2.10) and (2.11) imply Theorem 2.1. 
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3 Application and Simulations 


This section is to discuss some applications of our universality results in high-dimensional statistical 
inference and conduct simulations to check the quality of the approximations of our limiting law. 


3.1 Equality of two covariance matrices 


Consider the model of the following form 

Zr = S*X, Z 2 = £|Y, 

where X and Y are pxn and px m random matrices satisfying the conditions of Theorem 2.1, X i 
and S 2 are p x p invertible population covariance matrices. We are interested in testing whether 
Si = S 2 . Formally, we focus on the following hypothesis testing problem 

Ho : Si = X 2 vs. Hi : Si ^ S 2 . 


Under the null hypothesis we have 
iet( A Z '- Z| 


ZiZ;, 


= 0 


det{ A 


YY 


XX* 


= 0, 


m n m n 

which implies that we can apply our theoretical result to the largest root of det( A- 




under the null hypothesis. By Theorem 2.1 we see that Ai tends to Tracy-Widom’s distribution 
after centralizing and rescalling. 


3.2 Simulations 


We conduct some numerical simulations to check the accuracy of the distributional approximations 
in Theorem 2.1 under various settings of ( p,m , , n) and the distribution of X. We also study the 
power for the testing of equality of two covariance matrices. 

As in [13] we below use ln(Ai) to run simulations. To do so we first give its distribution. By 
[13] and (2.10) we can find that 

Ai = p p 4-it—^ + o p {n 2 / 3 ), (3.1) 

(JpTt ' 

where Z = i ? 1 ~ 1 (17) and U is a U( 0,1) random variable. By Taylor’s expansion we then have 


Recall \ 
can find 


where 


Z 


ln(Ai) = In (p p ) +- 7 ^ + o p (h 2/3 ). 


(3.2) 


H p \ = 0(p x ) and linip^oo a p = 1 in Section 2. Summarizing the above we 


lim P{a pin (ln(Ai) - p. p i n ) < s) = Fi(s), 

OO 


_ i \ _ ^J,P 

f^pln — ^ f^J,p)’> &pln — 

ft- & J,p 


(3.3) 

(3.4) 
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3.2.1 Accuracy of approximations for TW laws and size 

We conduct some numerical simulations to check the accuracy of the distributional approximations 
in Theorem 2.1, which include the size of the test as well. 


Table 1: Standard quantiles for several triples (p,m,n): Gaussian case 
Initial triple Mq=( 5,40,10) Initial triple Mi=(30,20,25) 


Percentile 

TW 

M 0 

2Mo 

3Mo 

4M 0 

Mi 

2 Mi 

3Mi 

AM i 

2*SE 

-3.9 

0.01 

0.0208 

0.0133 

0.0124 

0.0115 

0.0017 

0.0035 

0.0048 

0.0060 

0.002 

-3.18 

0.05 

0.0680 

0.0601 

0.0562 

0.0582 

0.0210 

0.0276 

0.0327 

0.0370 

0.004 

-2.78 

0.1 

0.1176 

0.1120 

0.1088 

0.1095 

0.0608 

0.0712 

0.0808 

0.0842 

0.006 

-1.91 

0.3 

0.3154 

0.3030 

0.3080 

0.3084 

0.2641 

0.2744 

0.2864 

0.2909 

0.009 

-1.27 

0.5 

0.5139 

0.5070 

0.5051 

0.5082 

0.4839 

0.4904 

0.4960 

0.4964 

0.01 

-0.59 

0.7 

0.7073 

0.7154 

0.7012 

0.7111 

0.7055 

0.7031 

0.7019 

0.7005 

0.009 

0.45 

0.9 

0.9083 

0.9058 

0.9047 

0.9090 

0.9040 

0.9010 

0.9016 

0.9003 

0.006 

0.98 

0.95 

0.9561 

0.9544 

0.9517 

0.9557 

0.9489 

0.9530 

0.9504 

0.9498 

0.004 

2.02 

0.99 

0.9919 

0.9909 

0.9913 

0.9919 

0.9878 

0.9887 

0.9897 

0.9901 

0.002 


Table 1 is done by R. We set two initial triples ( p , m, n) of Mo = (5,40,10) and M\ = (30, 20, 25) 
and then consider 2Mj,3Mj and 4Mq i=l,2. The triples Mo and Mi correspond to invertible YY* 

and noninvertible YY* respectively. For each case we generate 10000 (X,Y) whose entries follow 

z z* z z* 

standard normal distribution. We calculate the largest root of det(\ -1M) = 0 to get 

ln(Ai) and renormalize it with /r p i n and a p i n . In the “Pecentile column”, the quantiles of TW\ law 
corresponding to the “TW” column are listed. We state the values of the empirical distributions 
of the renormalized Ai for various triples at the corresponding quantiles in columns 3-10 and the 
standard errors based on binomial sampling are listed in the last column. QQ-plots corresponding 
to the triples (20,160,40) and (120,80,100) are also stated below. 
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Q-Q Plot of p=20 m=160 n=40 


Q-Q Plot of p=120 m=80 n=100 




The next two tables and graphs are the same as table 1 and the corresponding graphs except that 
that we replace the gaussian distribution by the some discrete distribution and uniform distribution. 


Table 2: Standard quantiles for several triples (p,m,n): Discrete distribution with the probability 
mass function P(x = y3)=P(x = — x/3)=1/6 and P(x=0)=2/3. 

Initial triple Mq=( 5,40,10) Initial triple Mi=(30,20,25) 


Percentile 

TW 

M 0 

2 Mo 

3Mo 

4M 0 

Mi 

2Mi 

3Mi 

4Mi 

2*SE 

-3.9 

0.01 

0.0192 

0.0132 

0.0136 

0.0123 

0.0006 

0.0031 

0.0046 

0.0047 

0.002 

-3.18 

0.05 

0.0637 

0.0581 

0.0571 

0.0573 

0.0216 

0.0302 

0.0321 

0.0356 

0.004 

-2.78 

0.1 

0.1147 

0.1101 

0.1099 

0.1088 

0.0626 

0.0733 

0.0757 

0.0824 

0.006 

-1.91 

0.3 

0.3100 

0.2966 

0.3060 

0.3029 

0.2665 

0.2721 

0.2808 

0.2827 

0.009 

-1.27 

0.5 

0.5000 

0.4959 

0.4969 

0.4996 

0.4841 

0.4834 

0.4985 

0.4899 

0.01 

-0.59 

0.7 

0.7025 

0.7013 

0.7099 

0.7018 

0.6990 

0.6992 

0.7109 

0.6975 

0.009 

0.45 

0.9 

0.9107 

0.9061 

0.9071 

0.9036 

0.9014 

0.9040 

0.9059 

0.9001 

0.006 

0.98 

0.95 

0.9566 

0.9546 

0.9538 

0.9546 

0.9503 

0.9527 

0.9526 

0.9512 

0.004 

2.02 

0.99 

0.9929 

0.994 

0.9903 

0.9914 

0.9890 

0.9908 

0.9901 

0.9894 

0.002 


9 







Q-Q Plot of p=20 m=160 n=40 


Q-Q Plot of p=120 m=80 n=100 




Table 3: Standard quantiles for several triples (p,m,n): Continuous uniform distribution 
U(-V 3,V3) _ 

Initial triple Afo=(30,80,40) Initial triple Mi=(80,40,50) 


Percentile 

TW 

M 0 

2 Mq 

3Mq 

4 M 0 

Mi 

2 Mi 

3Mi 

4Mi 

2*SE 

-3.9 

0.01 

0.0098 

0.0117 

0.0122 

0.0120 

0.0101 

0.0087 

0.0092 

0.0096 

0.002 

-3.18 

0.05 

0.0612 

0.0632 

0.0606 

0.0592 

0.0514 

0.0462 

0.0492 

0.0482 

0.004 

-2.78 

0.1 

0.1205 

0.1243 

0.1208 

0.1197 

0.1023 

0.0942 

0.1033 

0.0992 

0.006 

-1.91 

0.3 

0.3644 

0.3542 

0.351 

0.3432 

0.3132 

0.2946 

0.3101 

0.3017 

0.009 

-1.27 

0.5 

0.5767 

0.5575 

0.5563 

0.5496 

0.516 

0.5073 

0.5151 

0.5069 

0.01 

-0.59 

0.7 

0.7728 

0.7540 

0.7443 

0.7440 

0.7182 

0.7123 

0.714 

0.7171 

0.009 

0.45 

0.9 

0.9397 

0.9243 

0.9181 

0.9202 

0.9141 

0.9068 

0.9071 

0.9059 

0.006 

0.98 

0.95 

0.9722 

0.9672 

0.9599 

0.9614 

0.9584 

0.9538 

0.9556 

0.9534 

0.004 

2.02 

0.99 

0.9959 

0.9941 

0.993 

0.9922 

0.9932 

0.9912 

0.9919 

0.9916 

0.002 
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Q-Q Plot of p=120 m=320 n=160 


Q-Q Plot of p=320 m=160 n=200 




When considering the test of equality of two population covariance matrices since Hi is assumed 
to be invertible in the null case Hi = H 2 , without loss of generality, we may assume that Hi = 
H 2 = I. Therefore one may refer to Table one as well for the size of the test for the nominal 
significant levels. 

3.2.2 Power 

We study the power of the test and consider the alternative case 

Zi = HX, Z 2 = Y, 


where H/I. 

p ^ I 2 

When YY* is invertible we choose H = I + r" 1 P eiof , where r = \ — —. The reason 

1— 1 1 ’ V m n mn 

m 1 

P~—r 

why we choose the factor is that when r > 1 it is a spiked F matrix and the largest eigenvalue 


converges to normal distribution weakly by Proposition 11 of [5]. 

When YY* is not invertible by Theorem 1.2 of [2] we can find out that the smallest non-zero 
eigenvalue of Tj]-i/^YY*H~ 1 / 2 is not spiked for the above H. So it is hard to get a spiked F 
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matrix. Therefore we use another matrix 


S = 


U) 


UJ 




w / 


In Tables 4-6 the data X and Y are generated as in Tables 1-3 and the nominal significant level 
of our test is 5%. 


Table 4: Power of several triples(p,m,n): Gaussian distribution 
Initial triple Mq=( 5,40,10) Initial triple Mi=(30,20,25) 


r 

M 0 

2Mq 

3M 0 

4 M 0 

UJ 

Mi 

2Mi 

3Afi 

4Mi 

0.5 

0.0672 

0.0585 

0.0563 

0.0593 

0.3 

0.2178 

0.4934 

0.7071 

0.8419 

2 

0.2763 

0.3801 

0.4551 

0.5067 

0.6 

0.0574 

0.1332 

0.2241 

0.3106 

4 

0.6291 

0.816 

0.9072 

0.9567 

2 

0.1037 

0.2166 

0.3463 

0.5029 

6 

0.8162 

0.9543 

0.988 

0.9967 

3 

0.2242 

0.5521 

0.8156 

0.9537 
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Table 5: Power of several triples(p,m,n): Discrete distribution 


Initial triple Mq=(5,40,10) 


Initial triple Mi=(30,20,25) 

T 

M 0 

2Mq 

3 M 0 

4 M 0 

UJ 

Mi 

2Mj 

3Mi 

4Mi 

0.5 

0.0674 

0.0573 

0.0576 

0.0595 

0.3 

0.2101 

0.4883 

0.7024 

0.8425 

2 

0.3045 

0.397 

0.4561 

0.5171 

0.6 

0.057 

0.1382 

0.2176 

0.3078 

4 

0.647 

0.8137 

0.8984 

0.9478 

2 

0.1055 

0.2232 

0.3504 

0.4974 

6 

0.8147 

0.943 

0.9813 

0.9936 

3 

0.2254 

0.5487 

0.8211 

0.9529 


Table 6: 

Power of several triples(p,: 

m,n): Continuous uniform distribution U(—\/ 3, \/3) 


Initial triple Mq= (30,80,40) 


Initial triple Mi=(80,40,50) 

T 

M 0 

2 Mo 

3 Mo 

4M 0 

UJ 

Mi 

2Mi 

3Mi 

4Mi 

0.5 

0.2283 

0.3188 

0.3977 

0.4662 

0.3 

0.9965 

1.0000 

1.0000 

1.0000 

2 

1.0000 

1.0000 

1.0000 

1.0000 

0.6 

0.7112 

0.9623 

0.9964 

0.9999 

4 

1.0000 

1.0000 

1.0000 

1.0000 

2 

0.9257 

1.0000 

1.0000 

1.0000 

6 

1.0000 

1.0000 

1.0000 

1.0000 

3 

1.0000 

1.0000 

1.0000 

1.0000 
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In Tables 4-6 we can find that when r = 0.5 < 1 (S _1,/2 YY*S _1 / 2 )XX* is not a spiked F 
matrix and the power is poor. When r > 1 it is a spiked F matrix and the power increases with the 
dimension and r. This phenomenon is due to the fact that it may not cause significant change to 
the largest eigenvalue of F matrix when finite rank perturbation is weak enough. This phenomenon 
has been widely discussed for sample covariance matrices, see [10] and [3]. For the spiked F matrix 
one can refer to [5] and [28]. For the non-invertible case when XI is far away from I(w = 0.3 or 3) 
the power becomes better. This is because when the empirical spectral distribution (ESD) of X 
is very different from the M-P law Ai may tend to another point /is instead of fi p . Then we may 
gain good power because n 2 // 3 (/i£ — Pp ) may tend to infinity. 


4 Proof of Part(i) of Theorem 2.1 

4.1 Two key Lemmas 

This subsection is to first prove two key lemmas for proving part(i) of Theorem 2.1. We begin with 
some notation and definitions. Throughout the paper we use M, Mq, Mq, Mq , Mi, M" to denote 
some generic positive constants whose values may differ from line to line. We also use D to denote 
sufficiently large positive constants whose values may differ from line to line. We say that an event 
A holds with high probability if for any big positive constant D 

P{ A c ) < n~ D , 


for sufficiently large n. Recall the definition of 7 j in (2.4). Let c P) 0 £ [0, a p ) satisfy 


1 p 
lE 


Cpfl 


P j =1 7 3 c p, 0 


n 

P' 


Existence of c Pj 0 will be verified in Lemma 1 below. Moreover define 


Pp, 0 — 


L ( i +1 yc—Stf—)), -L = -L(i + iy ( 

n .n n 4-f 7i - Cn.o cr 3 n c 3 n n z -' 


Cp, 0 


(4.1) 


(4.2) 


c p, 0 n j =1 7 j c p, 0 0 c p, 0 n j =1 7 j c P) o 

Set A p = ^YY* and B p = ^XX*. Rank the eigenvalues of the matrix A p as 71 > 72 > • • • > 
7 p . Let c p £ [0, 7 p) satisfy 


1 


V(w 

PTFi 7i 


Cm 


n 

p’ 


The existence of c p with high probability will be given in Lemma 2 below. Moreover set 


„ 1 , 1 V—' 

Pp = — (1 H— / ] 


n *—' 7 o 

3 =1 13 


~ C , 


,, 1 1.1 V—' 


H j=l 


Cf) 


(4.3) 


(4.4) 


p p 

We now discuss the properties of c p , c p fi, c p , n p , /i p ,o, p p , cr p , a Pi o defined (2.5)-(2.7), (4.1)- (4.4) in 
the next two lemmas. These lemmas are crucial to the proof strategy which transforms F matrices 
into an appropriate sample covariance matrix. 
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Lemma 1. Under the conditions in Theorem 2.1, there exists a constant Mo such that 


sup{ p } < M 0 , 

sup{ p } < M 0 , 

(4.5) 

73 

1 

7p 

73 

1 

7p 

O 


lim n 2//3 \p p 

P p , o| — o, 

(4.6) 

p—>oo 



lim CTp = 1, 

r c p,0 ^ i 

lim sup < 1. 

(4.7) 

p^oo a P) Q 

p Clp 



Proof. The exact expression of c p in (2.5) can be figured out under the conditions in Theorem 2.1 
(see Section 5). In fact, when n = p, from (5.9) below we have 

(m — p) 2 
p 2(m + p)m 

Recall the definition of a p in (2.3). It follows that 

Cp _ (Vrh+y/p) 2 
a p 2 (m + p) 

which further implies that 

limsup— <1. (4.8) 

P &p 

In view of this, there are two constants Mo > 0 and Mg > 0 such that 

sup{—^—} < Mo, inf{c p } > Mq. (4.9) 

p dp Cp p 

When n p, from (5.7) below we have 

n(m + p) (m + n — p) — (m + 2 n — p) mnp(m + n — p) 
p m{n — p){m + n ) 

Using the above expression for c p one may similarly obtain (4.8)-(4.9) as well but with tedious 
calculations and we ignore details here. 

Now we define a function fi(x) by 


fi(x) 


1 p 


P 7 j~ x 


We claim that there exists c Pj q £ (0, a p ) so that 


fi( c p,o) 


n 

p' 


(4.10) 


(4.11) 


Indeed, due to (4.8) we obtain 


M c p) = l ^12 [ J (pp-) 2 ep( x )dx = [ (^^) 2 0 P (x)dx. 

P i = 1 Jj Cp Jj. X Cp X Cp 
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This, together with (2.3) and (2.5), implies that 


/l(Cp) > 


n 

P 


(4.12) 


and 


n 


r 70 


P 


) 2 g p (x)dx > 1 E 


_ . _ _ (4-13) 

' 7 p x C p p 7j-l Cp 

Note that /i(x) is a continuous function on (0, a p ) and /i(0) = 0. These, together with (4.12), 
ensure that there exists c Pi o £ (0, Cp] so that (4.11) holds, as claimed. 

We next develop an upper bound for the difference between c Pi o and c p . It follows from (4.12) 
and (4.13) that 


\h(c P )--\= -E 

p p “ 


\2 


no 


P~[ 7j - c p 


) - (- 


\2 


aiE((^) 2 -( Cp ) 2 ) 

'Tj-i-Cp > 


i p 

2/u „ \ V 


X — Cn 


Q p (x)da 


< 


c p ) 


p(a p - c p ) 4 ^ 


E l^' “ 7l-i I < 


2 (M 0 )\(b p - a p ) 

(Mq) 2 p 


where the last inequality uses (4.8)-(4.9). With M[ = ^ the above inequality becomes 

n M[ 


l/i(cp)-1 < -• 

p p 

Moreover taking derivative of f(x) in (4.10) yields 


(4.14) 




2x 2 


+ 


2x 


P (7 j ~ x) 3 ( 7 j ~ x ) 2 ' 


(4.15) 


When 0 < x < c p (smaller than a p ) and f\{x) > 


f'i(x) > ~E 


2x 


n n 
> — > 


p (7? — x) 2 px pa p 


(4.16) 


When Cpfi < x < c p we always have fi{x) > ^ via (4.11) because /{(x) > 0 by (4.15). Via (4.14) 
and (4.16) we then obtain from the mean value theorem that 


|c p ,o Cp | < 


M[a p 

n 


(4.17) 


This, together with (4.9), implies that there is a constant M\ > 0 such that when p is big enough, 

Mi < c Pi o < Cp. (4.18) 

We conclude from (2.6), (4.2), (4.9), (4.17) and (4.18) that 

1 1 1 


111 1 

\p P -Pp,o\ < i-i + -E max ((i- 

Cp Cp,o n 7j - Cp,o 7j - c P 7j - c p ,o lj-i ~ c p 


1} 
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< 


Cp Cp, 0 

Ml 



(It j - 7j-i| + |cp 

Ml 


c p, o\)Mp 


^ o p bp Op pM 0 M iap 
~~ nMl nMl n 2 Ml 


Similarly one can prove that 




°0' 


O(-). 

p 


(4.19) 


(4.6) and the first result in (4.7) then follow. From (4.8) and (4.17) one can also obtain (4.5) and 
the second result in (4.7). □ 


Lemma 2. Under the conditions in Theorem 2.1, for any ( > 0 there exists a constant M^ > Mq 
such that 

c p l ^ i\/r i™ c p ^ i (4 20 ) 


sup{-—7__} < Ms, limsupM- < 1, 
p Ip ~ c p p Ip 


and 


lim n 2 ^\jlp — p p o =0, lim —— = 1. 

p—>oo ’ p->oo (Jp Q 


hold with high probability. Indeed (4-%0) and (4-21) hold on the event S^ defined by 

Sc = {Vj, i < j < p, It j - Til < p c p~ 2/3 r 1/3 }, 


(4.21) 


(4.22) 


where ( is a sufficiently small positive constant and j = min{min{m,p} + 1 — j, j}. 


Proof. Define a function /(x) 

by 




/( x ) = ^E( 

^ 3 = 1 

x \2 
‘7j-* 

(4.23) 

From (4.1) and (4.10) we have 

n 

P’ 



fl(Cpfi) 

(4.24) 

The first aim is to find c p 

G [0 , t p ) to satisfy 




f{c P ) = 

n 

V 

(4.25) 


When f is small enough we conclude from (4.23), (4.24) and (4.35) that on the event S^ 


\f(c P ,o) - ~\ = |/(cp, 0 ) - /l(Cp, 0 )| = |- ^(( 

P 7j-Cp,o Ti-Cp,o 


1 F 

Cp ’° -) 2 -(- CP ’° ^ 


|Tj + Ti - 2Cp,o| _ y- _ 

P “r L (Ti-Cp,o) 2 (Ti-Cp,o) 2l ^ l7j 7jl 


< max{ 


\p(p 2 /3 j 1 /3| + 2| Tj - Cp,o| 

P i L (-pCp-2/ 3 j-1/3 + 7i _ Cp,o) 2 (Ti - c P,o) : 


< m ax { 


-}^2p C P 2/3 j 1/3 = 0(p C x ), (4.26) 


i=i 
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where the last step uses the fact that via (4.5) and (4.18) 

\p^p~ 2/3 j~ 1/3 + 2 jj — 2c p0 | t ^ 

maxi-— ; --- — - \ < M. 

i (-pCp-2/3j-i/3 + 7j - c Pi0 ) 2 (7i - c p,o) 2 

Taking derivative of (4.23) yields 

f'( x ) = 1 X'f _—_|_—_) 

When 0 < c Pi o — p^ 1 / 2 < x < c Pi o + p _1//2 from (4.5) and (4.18) we have on the event S £ 

N $i m !, c p,o(ti -®) 2 -* 2 (7j -Cp,o) 2 , 

l /7, o ) - /Ml = -i E _<*,)» 


(4.27) 


1, y- (cp,o ~ x )7j[ c p.o(7j ~ g) + g(7j ~ Cp,o)] _ -i/ 2 n 

P pt (7j -®) 2 (7j -c P ,o) 2 


(4.28) 


When 0 < x < 7 P we have 


It t \ 1 2x 2 

/ (®) > - 77-777 = 7/( x ) > 


V “ (7j - x ) 2 ® ( c p,0 + P _1/2 ) 

In view of this, (4.26) and (4.28) there exists M 2 > 0 so that 

f'(x) > M 2 , 


fix). 


(4.29) 


for sufficiently large p when 0 < x < 7 p . On the event S £, applying the mean value theorem yields 

f(cp, 0 - P~ 1/2 ) < /(cp,o) - M 2 p~ 1/2 

and 

f(c P ,o +P~ 1/2 ) > /(cp, 0 ) + M 2 p~ 1/2 . 

It follows from (4.26) that when p is large enough, 

Hc p ,0-P~ 1/2 ) < “ < /( C P,0 +P _1/2 )- 

Since /(x) is continuous on (0,7 P ) there is c p E [0, 7 P ) (c Pi o < c p < a p = 7 P by Lemma 1) so that 
(4.25) holds and 


Cp, 0 -P < c p < c Pi0 +p 1/2 . 


_p-V 2 

From (4.26), (4.25) and (4.29) we have 

|cp,o - c p \ = 0(p c_1 ). 


(4.30) 
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Recall a p = 7 p . The second inequality in (4.20) holds on the event S £ due to (4.7), (4.22) and 
(4.30). Likewise on the event in view of (4.5) and (4.30) there exists a constant Mq > Mq such 
that 


sup{-—ur} < Mr, 

P Ip — cp 

the first inequality in (4.20). 

Due to c p < t p and the definition of f(x) in (4.23) we have 


(_ n _)2 

7 p ~ Cp p 




which implies that 

It follows from (4.2) and (4.4) that 
I f^p,o — Apl — 


dp T 


7 p 


1 + J~p 


, 1 1 1 ^. 1 1 

I-~| 4— / I-v-~| 

Cpp c p n ^,_ 1 7 j — Cpfl 7 j — c p 
< \cpfl — c p | ~ 7?I + I c p,o — c p 


(4.31) 


(4.32) 


CpfiCp n (p/p c Pi o)(7p dp) 

We then conclude from (4.30)-(4.32) that on the event 

K,o - ApI = 0(/ _1 ). (4.33) 

It’s similar to prove that 

|4- (434) 

a P < 0 

(4.21) then holds on the event S Moreover, by Theorem 3.3 of [19], for any small £ > 0 and any 

-D > 0, 


P(S c ( )<p 


-D 


The proof is therefore complete. 


(4.35) 


□ 


4.2 Proof of Part (i) of Theorem 2.1 

Proof. Recall the definition of the matrices A p and B p above (4.3). Define a F matrix F = A~ 1 Bp 
whose largest eigenvalue is Ai according to the definition of Ai in Theorem 2.1. It then suffices to 
find the asymptotic distribution of Ai to prove Theorem 2.1. 

Recalling the definition of the event S^ in (4.22) we may write 

P{a p n 2 /\ A x - fi p ) < s) = p((<7 p n 2/3 ( A x - ,i p ) < s) f| S c ) + P^a p n 2 / 3 (X 1 - ,i p ) < s) f| . 
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This, together with (4.35), implies that (2.10) is equivalent to 


^hn^p((<7 p n 2// ' 3 (Ai - p p ) < s) Q = Fi(s). 


Write 


(4.36) 


cr p n 2/3 (Ai - fip) = ^o- p n 2/3 (Ai - p p ) + a p n 2/3 {p p - n p ). 

(j p 

(see (4.3) and (4.4) for a p and p p ). Note that the eigenvalues of A^ 1 are 7 < 7 < 
Rewrite (4.3) as 


;£< 

3= 1 


A 

li p \2 


p z —C l - 4 -c 


7 p°P 


n 

P 


(4.37) 


< A. 

- 7p 


(4.38) 


Also recast (4.4) as 


p 

7j 


„ 1 . p 1 v— ' 7,; ( 'P . 1 1 . pi 

Pp = —(1 H-/ -1 ~), To = tt(1H-/ — 

Cp npj^l-^-Cp a 3 4 npj^ 1 


~ C-n 

7p P 


7 


(4.39) 


Up to this stage the result about the largest eigenvalue of the sample covariance matrices ZZ*S 
with X being the population covariance matrix comes into play where Z is of size p x n satisfying 
Condition 1 and X is of size p x p. A key condition to ensure Tracy-Widom’s law for the largest 
eigenvalue is that if p £ (0,1 /a\) is the solution to the equation 


/<! 


then 


——) 2 RF s (f) = - 
- tp J W p 


limsuppui < 1, 

p 


(4.40) 


(4.41) 


(one may see [6], Conditions 1.2 and 1.4 and Theorem 1.3 [3], Conditions 2.21 and 2.22 and Theorem 
2.18 of [15]). Here T s (t) denotes the empirical spectral distribution of X and o\ means the largest 
eigenvalue of X. Now given A p . if we treat A” 1 as X, then (4.41) is satisfied on the event due 
to (4.3) and (4.20) in Lemma 2. It follows from Theorem 1.3 of [3] and Theorem 2.18 of [15] that 


which implies that 


lim^P^(a p n 2/3 (Xi - p p ) < s) PjS^Ap) = F 1 (s), 
^hn^p((<7 p n 2//3 (Ai - /tp) < s) P)5 ? ) = Fi(s). 


Moreover by Lemmas 1 and 2 we obtain on the event S£ 


lim p p = 1 

p-> oo dp 


and 


lim a p n 2 ^ 3 (pp — p p ) = 0. 


p —^OO 


(4.42) 

(4.43) 

(4.44) 

(4.45) 


(4.36) then follows from (4.37), (4.42)-(4.45) and Slutsky’s theorem. The proof is complete. □ 
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5 Proof of (2.11) 


Proof. This section is to verify (2.11) and give an exact expressions of c p , p p and a p in (2.5)- 
(2.7) at the mean time. We first introduce the following notation. Let m = max{m,p}, h = 
min{n, m + n — p} and p = min {m,p}. Choose 0 < a p < | and 0 < /3 P < | to satisfy 


Define 


and 


sin 2 (a p ) = -v——, sin 2 (j3 p ) = 

77 i + n 


777 2/ n \ 

P P = ~ tan (a p + pp) 
n 


rh + h 


1 3 16h 2 


(5.1) 

(5.2) 

cr| ^ p (rh + n) 2 sin(2/3 p ) sin(2a p ) sin 2 (2/3 p + 2a p ) ^ ^ 

We below first verify the equivalence between (2.6)-(2.7) and (5.2)-(5.3). For definiteness, 
consider p < m. in what follows and the case p > m can be discussed similarly. Denote by s(z) the 
Stieltjes transform of the MP law p p (x) 

s(z ) = f dx, Im(z) > 0 


x — z 


and set 


9 P (x) = 


l-JL-x- ./(x — 1 — —) 2 — 4-2- 

m \ \ rri' m 


2f-x 


(5.4) 


which is the function obtained from s(z) by replacing z with x (one may see (3.3.2) of [1]). Evidently, 
the derivative of s(z) is 

s ' w = / pziy dx - 

Note that c p is outside the support of the MP law (see Lemma 1). In view of the above and (2.5) 
we obtain 

(5.5) 


2 / / \ 
c p9 p \ c p) — 


which further implies that 


, 7 1 p _ a -^) 2 -( 

P m 777 -- + 1 — 

m 1 m 

When n p, solving (5.6) and disregarding one of the solutions bigger than a p we have 


(5.6) 


( ™+P \ ( P 


Cn — 


)(£ I P _ pf.) _ J (l I P\2(P I P P 2 ) 2 I Q P)2(P I P P 2 )(P p( P I P) 

\ m / V m T n mn' y ^ ' m' ' n mn ' ‘ V m' ' n mn J\n m ' n* 


( 1 - £)(41 + 2 ) 

\ n' ns 

n(m + p) (m + n — p) — (m + 2n — p)mnp(m + n — p) 
m(n — p)(m + n) 


(5.7) 
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This, together with (2.6), yields 


1 p . . 1 2 (m + n—p) (n — p)m 

9p = I 9p\ c p) = 


n 


c p m + 2n — p (m + 2n — p)n 


m (n — p)(n(m + n — p) + yfmnp{m + n — p)) 


n n(m + p){m + n — p) — (m + 2 n — p)\Jmnp{m + n — p) 

m (y/(m + n - p)n + y/mp ) 2 


By (5.1) one may obtain 


n ( yjm{m + n — p) — y/np ) 2 


n -:-\ . .- y/(m+n-p)n+y/mp 

y/(m + n-p)n + y/mp _ m+n 

y/m(m + n—p) — y/np y/m(m+n-p)-^np 

m+n 


cos Op sin f3 p + sin a p cos fi p 
cos a p cos j3 p — sin a p sin f3 p 


= tan(a p + /3 P ). 


It follows that 


which is (5.2). 


m 


p P = — tan 2 (a p + f3 p ), 


n 


Using (5.1), (5.2) and the second derivative of g p {x) at c p (2.7) can be rewritten as 

1 1 , p n, y 

777 ~n^3 p \ c p) 


+ Cp 2n p 


1 p 1 — — 

=-b — (- — 

-3 ' V p _„3 


2?? v ir 

m C P 

1 


(Cp-1- + 2 - 4 J 

' P m / 7 


V_ c 3 
mV 


i + ^ 

m 


P_ r 2 

m^P 


+ 


r _ 1 _ JL\ 2 _ ii 
C P 1 ml 


+ 




2 -2-c p 

m P 


./fo - 1 - *)* - 4^ 2^c p (( Cp -l-S) 2 -4^)^' 

y \ P m 2 m 

= cos 2 (/3 p ) cot 3 (/S p ) csc(ap) sec(a p ) sec 4 (/3 p + a p ) tan 4 (+ + a p ) 

= 16cos 4 (+) cot 2 (/3p) csc(2/lp) csc(2a p ) csc 2 (2+ + 2a p ) tan 6 (/3 p + a p ) 


= 16 


= 


mr m 1 6 , . 

(m + n) 2 n sin(2/3 p ) sin(2a p ) sin 2 (2+ + 2a p ) p p 


16n 2 


p [m + n ) 2 sin(2/3p) sin(2a p ) sin 2 (2+ + 2a p ) ’ 

which is (5.7). 

When n = p, solving (5.6) yields 

( l -£) 2 ( m-p ) 2 


Cr) — 


p 2(1 + |() 2{m + p)m' 

From (5.1) one may conclude that a p = /3 p . Since 

1 p , 1 2 (m + n — p) 

9p = I 9p\ c p) = 


4 m 2 


Cp m-\-2n — p (m — p) 2 


(5.8) 


(5.9) 
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and 


2 y/mp 

(m — p) 


2 V™p 

m-\-p 

rri—p 

m+p 


sin(2a p ) 

cos(2a p ) 


tan(a p + /3 P ) 


we have 


Pp 


m 

n 


tan 2 (a p + /3 P ). 


It’s similar to prove 

1 _ 3 16n 2 1 

cr| ^ p (m + n) 2 sin(2/3 p ) sin(2a p ) sin 2 (2/3 p + 2a p ) 

The above implies the equivalence between (2.6)-(2.7) and (5.2)-(5.3). 

It is straightforward to verify that | f(pj, P — p v \ = 0(p _1 ) and linip^oo cr p ^fj^aj tP 
to (5.1)-(5.3) and (2.2). 


(5.10) 


(5.11) 


1 according 
□ 


6 Proof of Part (ii) of Theorem 2.1: Standard Gaussian Distribu¬ 
tion 


This section is to consider the case when {Xjj} follow normal distribution with mean zero and 
variance one. We below first introduce more notation. Let A = (Ay) be a matrix. We define the 
following norms 


A||=max|Ax|, || A^ = max |Ay |, ||A||f = | A?| 2 i 


where |x| represents the Euclidean norm of a vector x. Notice that we have a simple relationship 
among these norms 

IIAlloc < ||A|| < ||A|A. 


We also need the following commonly used definition about stochastic domination to simplify the 
statements. 


Definition 2. (Stochastic domination) Let 

£ = {£ {n \u) : n € N,u E U {n) }, C = : n G N,u € t/ (n) } 


be two families of random variables, where U^ n > is a n-dependent parameter set (or independent of 
n). If for sufficiently small positive e and sufficiently large a, 


sup J 

ueu( n '> 


|£ (n) A)l >n e |C (n) («)l 


< n 


for large enough n > n(e,cr), then we say that £ stochastically dominates £ uniformly in u. We 
denote this relationship by |£| -< ( and also write it as £ = 0^((). Furthermore we also write it as 
|x| -< y if x and y are both nonrandom and |x| < n e \y\ for sufficiently small positive e. 
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Proof. We start the proof by reminding readers that m < p and m + n > p. Since m < p the limit 
of the empirical distribution function of lY*Y is the MP law and we denote its density by p pm (x). 
We define 7 m ,i > 7 m , 2 > ■ ■ ■ > 7 m,m to satisfy 

r + OO 

( 6 . 1 ) 


' 7 m,j 


ppmdx — j 

m 


with 7 m) o = (1 + y/ 7 ) 2 , 7 m,m = (1 — Correspondingly denote the eigenvalues of ^Y*Y by 


P / 7 ///t-j/ft- \l p J ■ X CD~J O ' p 

7m,i > 7m ,2 > • • • > 7m,m- Here we would remind the readers that Pp m (x), 7mj,7m,i are similar to 
those in (2.3), below (2.3) and above (4.3) except that we are interchanging the role of p and m 
because we are considering lY*Y rather than ^YY*. Moreover as in (4.35) and (4.22) for any 
sufficiently small f > 0 and big D > 0 there exists an event S £ (here with a bit abuse of notion S^) 
such that 

Sc = {Vj, 1 < j < m, |t - 7mj | < P C ~ 2/3 r 1/3 } (6.2) 

and 

P(S C ( ) < p~ D . (6.3) 

Note that lYY* and lY*Y have the same nonzero eigenvalues. To simplify notation let 
m p = m + n — p. Write 


-YY* = U* 

p 



with D = diag{^ mt i, 7 m, 2 ; ■ ■ • ,7m,m} and U is an orthogonal matrix. Then det(A 
is equivalent to 

- — U*X.X*U | = 0. 
m n 


YY* XX* 


(6.4) 


= 0 


•bp 


H 

( D 

° 

V 1 

^ o 

0 / 


Moreover, since {Xjj} are independent standard normal random variables and U is an orthogonal 
matrix we have C/X = X so that it suffices to consider the following determinant 


det A 



-XX* | = 0. 

m„ 


(6.5) 


Here = means having the identical distribution. 


Now rewrite X as X = 
It follows that 



where Xi is a m, x n matrix and X 2 is a (p — m) x n matrix. 


XX* = 


X V* V V* 

X-A.| xVx-A.2 

X V* V V* 

2-^1 -A. 2-^-2 



( 6 . 6 ) 
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(6.5) can be rewritten as 


det 


—Xn — AD -^-X 12 

m p m p 

-^-X 21 —X 22 


= 0. 


Since m + n > p, X 22 is invertible. (6.5) is further equivalent to 


1 


1 


det(—Xn - AD - —X 12 X^X 2 i) = 0. 


m r . 


Moreover, 


m. 


(6.7) 


X V" "V" — J-"V "V "V* V "V” * ( "V" 'V*\—J-V "V* ‘V ( T "V* /'V ‘V* A — 1V \ 'V* 

11 — ^-12^ 22 -^21 = A-1-&-1 — -*-1^2V^2^ 2 j -X-2-^-1 = -X-lfJn — -X-2V- X -2-*-2,) ^2j^l- 

Since rank{I n — X 2 (X 2 X 2 ) _1 X 2 ) = m + n — p = m p we can write 


In - X*(X 2 X*)- i X 2 = V 


Im P 0 


0 0 


V*. 


where V is an orthogonal matrix. In view of the above we can construct a m x m p matrix Z = 
(Zij) m ,m p consisting of independent standard normal random variables so that 

( 6 . 8 ) 


X n - XiaX^Xaj = ZZ*. 


It follows that (6.7) and hence (6.5) are equivalent to 

det( —ZZ* - AD) = 0. 


mrin 


(6.9) 


It then suffices to consider the largest eigenvalue of —D 1 ZZ*. Denote by Ai the largest 


eigenvalue of ~^D X ZZ*. As in (4.3) and (4.4) define c m E [0,7 m ,m) to satisfy 


1 


B 


m . .. 'Tmj 
.7 — 1 


Cm 


mm 


m 


( 6 . 10 ) 


and jl p and a p by 


/i m — 


1 rn m 

~(i +—V( . Cm . )), Tv = i(i + - E 

■ m.„ ' v 'V— j — m... ^^ 


Cflp 'ym,j c m 


CTlp r )mj Cjyi 


-?)■ 


From Lemma 2 we have on the event S £ 


r Cm ^ i 

lirnsup -- < 1, 

p 1m, m 


( 6 . 11 ) 


which implies condition (4.41). It follows from Theorem 1.3 of [3] and Theorem 2.18 of [15] that 

( 6 . 12 ) 

As in the proof of Theorem 2.1, by Lemmas 1 and 2 one may further conclude that 

(6.13) 

□ 


lim P(a m (m + n- p ) 2/3 {Ai - fi m ) < s) = F\{s). 

p—>■ OO 


lim P(a v (m + n — p ) 2 ^ 3 (Ai — p v ) < s) = F\{s). 

p—>• OO 
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7 Proof of Part (ii) of Theorem 2.1: General distributions 


The aim of this section is to relax the gaussian assumption on X. We below assume that X 
and Y are real matrices. The complex case can be handled similarly and hence we omit it here. 
In the sequel, we absorb ^ m + n _ p and into X and Y respectively( i.e. Var(Xij) = m+ ^_ p , 
Var(Y s t ) = for convenience. 

In terms of the notation in this section ( Var(Y s t ) = ^), (6.4) can be rewritten as 

D 0 \ 

U ' 

0 0 J 



Break U as 


Ur 


where Ui and U 2 are m x p and (p — m ) x p respectively. By (6.4)-(6.7) 


V U 2 / 

(note that here we can not omit U by UX = X), the maximum eigenvalue of det(AYY* — XX*) 
is equivalent to that of the following matrix 


A = D“3UiX(I - X T U^(U 2 XX T 7/ 2 T )- 1 172X)X T UfD-5 
= D-iu 1 X(I-P xT[/ T)X T UfD-|, (7.1) 

where Px T uJ * s the projection matrix. It is not necessary to assume that U 2 XX 2 Uf is invertible 
since Px_ T uf is unique even if (U 2 XX 7 U^)~ is the generalized inverse matrix of U 2 XX 7 U^. 
Moreover we indeed have the following lemma to control the smallest eigenvalue of U 2 XX r U^. 

Lemma 3. Suppose that (?n + ?i-p) 2 X satisfies Condition 1. Then L^XX^Ug) is invertible and 


||(U 2 XX t ’U^)“ 1 || < M 


(7.2) 


for a large constant M with high probability. Moreover, 


XX*|| < M 


(7.3) 


with high probability under conditions in Theorem 2.1. 

Proof. One may check that the conditions in Theorem 3.12 in [15] are satisfied when considering 
U 2 XX 7 Uj. Applying Theorem 3.12 in [15] then yields 

|A min (u 2 XX T Uj) - (1 - ^) 2 | A n~ 2 '\ 

V / Y p — m 

where (1 — \J can be obtained when considering the special case when the entries of X are 

Gaussian. As for (7.3) see Lemma 3.9 in [7]. □ 


26 









Since the matrix in (7.1) is quite complicated we construct a linearization matrix for it 


H = H(X) = 


/ -zl 0 D - 1 / 2 UiX \ 

0 0 U 2 X 

\ X T UfD “ 1 / 2 x T u(T -I / 


(7.4) 


The connection between H and the matrix in (7.1) is that the upper left block of the 3x3 block 
matrix H 1 is the Stieltjes transform of (7.1) by simple calculations,. We next give the limit of 
the Stieltjes transform of (7.1) and need the following well-known result (see [1]). There exists a 
unique solution m(z) : C + —>• C such that 

1 m f t 


= —z ■ 


-dH n (t), 


m(z) m + n — p J 1 + tm(z) 

where H n is the empirical distribution function of D 1 . Moreover, we set 


(7.5) 




m{z) =—Tr(z{l + m{z)D x )) 


p{x) = lim ^sm(z). 
z£C+—>x 


From the end of the last section we see that under the gaussian case (7.1) === D 1 / 2 ZZ*D 1//2 . 
Hence it is easy to see that p, m defined above (6.11) is the right most end point of the support of 
p{x). 

For any small positive constant r we define the domains 


E(t, n) = {z = E + ii 7 G C + : \z\ > r, \E\ <t 1 , 


n 


<P<r x }, 


(7.6) 


E+ = E + (t , t', n) = {z G E(t, n) : E > p m - r'}, (7.7) 

where t' is a sufficiently small positive constant. 

Set 


T = T(M = J 9 + —, G {z) = H 1 , S = E(z) = z~ l {l + m(^)D” 1 )- 1 . (7.8) 

y nr] nr] 

To calculate an explicit expression of G(*) we need the following well-known formula: 



We next develop the explicit expression of G(*). Denote the spectral decomposition of Ai = 
D- 1 / 2 U 1 X(I - Pxru^X^UfD- 1 ^ by 

m 

Ai = VAV t = A fc v fc v^, 
k =1 
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where 


Ai ... P X m+n - p >0 — A 


m+n—p+1 


= ... = A, 


It follows that 


r, ^ v fe (z)v fc (j) . . 

G ij = Z^— -— > 1<LJ <m, 


k =1 


Afc - 2 


(7.10) 


where Gij denotes the (z, j)th entry of the matrix G (z) and v/-(i) means the zth component of the 
vector Vfc. We denote (Gij)i<ij< m by G m , which is the green function of (7.1). Moreover, let 

T 


and 


a 2 = ^i -D- 1 / 2 u 1 xx T u^r d _1 / 2 u 1 x(i — p XTU r) 
/on n \ 


As = 


0 0 

o r ru 2 x 

V o x T u|r -i + iw/ 


where T = (U 2 XX J U 2 ) • Applying (7.9) twice implies that 

771 \ T \ T 

GW = A 3 + X = a 3 + A,G m A 

Ak Z 


k=1 


(7.11) 


To control the inverse of a matrix in the projection matrix we introduce the following smooth 
cutoff function 

1 if |a:| < Min~ 2 
0 if |x| > 2M\n ~ 2 , 

whose derivatives satisfy < Mn 2k , k=l,2,... and M\ is some positive constant. Let Ai > 

... > A p -m be the eigenvalues of U 2 XX 7 Uj and s(z) be the Stieltjes transform of its ESD. Since 


A(*) = 


p—m _2 

3(s(*n 2 )) = (p m) 1 t 2 4 > 

1^1 + n 

(7.12) 

O Q ~ jl/fo 

if )) < M\n~ , then A p _ m > - 

(7.13) 


we conclude that 


for some positive constant M 2 , which allows us to control the maximum eigenvalue of (U 2 XX T UT) -1 
outside the event {A p _ m > c}. Moreover, consider the event {A p _ m > c}. By Lemma 3, choosing 
a sufficient small constant c, we have 

1 — o{n~ l ) = P(A p _ m > c) < P(G(s(m~ 2 )) < Min~ 2 ), for any positive integer l. (7.14) 
Therefore, by Lemma 3 we have 

P(A(3 : (s(m~ 2 ))) 7 ^ 1) < o(n~ l ), for any positive integer l. (7.15) 
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Similarly, by Lemma 3, for ||X||^, we have 


F(X(n 3 ||X||^) / 1) < o(n l ), for any positive integer l. (7-16) 

Set T n {X) = A(51(s(m _2 ))A(n _3 ||X|||,), and 

F(*) = 

( -E XD“ 1 /2u 1 XX T Uf 1 r 0 N 

ru 2 xx T ufD- 1 / 2 s r - ru 2 xx T ufD- 1 / 2 XD~ 1 / 2 u 1 xx T ufr ru 2 x 

V 0 X T U^r (zm(z) + 1)(I-P x tut) ; 

(7.17) 


In fact, F(z) is close to G(z) with high probability. In view of (7.15) and (7.16) it is straight 
forward to see that 


T n {X) = 1 (7.18) 

with high probability and we will use it frequently without mention. 

We are now in a position to state our main result about the local law near fi m , the right end 
point of the support of the limit of the ESD of A in (7.1). 

Theorem 7.1. (Strong local law) Suppose that (m + n — p)5X and p^Y satisfy the conditions of 
Theorem 2.1. Then 

(i) For any deterministic unit vectors v, w E M p+ri 

(v, (G(*) - F(*))w) -< (7.19) 

uniformly z E E + and 

(ii) 

\rn n (z) — m(z)\ -< — (7.20) 

nr] 

uniformly in z E E + , where rn n (z) = X Gu. 

7.1 Local law (7.19) 

The aim of this subsection is to prove (7.19). Before proving (7.19) we first collect some frequently 
used bounds below. Recall the definition of m(z) in (7.5). For z E E(r,n ) one may verify that 

M 2 < \m(z)\ < Mi (7.21) 
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and 


Im(m(z)) > Mr], (7.22) 

(see Lemma 2.3 in [4] or Lemma 3.1 and Lemma 3.2 in [20]). Order the eigenvalues of D 1 as 
d\ > d 2 > • • • > d m . From (6.10) and (6.11) we conclude that on the event S £ defined in (6.2) 

limsupc m di < 1. (7.23) 

p 

Here we remind the readers that d\ corresponds to —-— there, validity of (7.23) does not depend 

im,m v 7 

on the Gaussian assumption there and we do not assume the entries of Y to be Gaussian in the 
last section. In addition, with probability one 

c m = — lim m(z), (7.24) 

ZGC~^ — 

(one may see below (1.8) in [4] or [20]). It follows from (7.23) and (7.24) that for z E E + on the 
event 

|1 + dm(z)\ > T 2 , de[d m ,di] (7.25) 

for some positive constant r 2 (one may also see (iv) of Lemma 2.3 of [4]). We then conclude (7.21) 
and (7.25) that on the event 

||£|| = ||£(z)|| < M, de[d m ,di], (7.26) 

where S = E<(z) is defined in (7.8). Moreover, for z E E + it follows from Lemma 3, (7.3), and 
(7.21)-(7.25) that 

||F(*)|| ^ 1, ||A 2 || ^ 1, ||A 3 || ^ 1. (7.27) 

We further introduce more notations with bold lower index 

G vs = (v, Ge s ), G vw = (v, Gw), and G sv = (e. s , Gv), 

where e s is the unit vector with the s-th coordinate equal to 1. In the sequel, if the lower index of 
a matrix is bold, then it represents the inner product above and otherwise it means one entry of 
the corresponding matrix. Fix r > 0. For any z E E(t, n) we claim that 

||G(z)Tn(X)|| <Cn w r,-\ ||^G(z)r„(X)|| < C^V 2 , (7.28) 

l|G(z)|| -< rj- 1 , H^G^Ht,- 2 , (7.29) 

^ |G V j| 2 =-||F(z)7;(X)/(5 s )|| < CnV 1 (7.30) 

i= 1 d 
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and 


-! — + 1, (7.31) 

V 

where and in what follows /(•) denotes an indicator function. Indeed, the estimates (7.28) follow 
from (7.11) and the definition of 7^(X) directly. (7.29) and (7.31) about the partial order follow 
from Lemma 3, (7.3) and (7.11). The first equality in (7.30) is straightforward and the second one 
is from the definition of 7n(X) directly. 

In the Gaussian case Theorem 7.1 can be obtained from by Theorem 2.10 of [7]. Indeed, 
from (7.11) one can see a key observation that each block of G(z) can be represented as a linear 
combination of the blocks of (3.3) in [15] in the Gaussian case. We now demonstrate such an 
observation by looking at two block matrices of G (z) and other blocks can checked similarly. For 
example, G(z) has a block matrix G m D _1//2 UiXX i U^T. Note that UiXX^U^r is independent 
of G m given U 2 X due to (I — P x tut)X 7 = 0 while from the end of the last section we see that 

A = D~ 1/2 ZZ*D~ 1/2 (7.32) 

given U 2 X under the gaussian case (see (7.1) for the definition of A). It follows that this block can 
be regarded as the product of random G m and a non-random matrix given U 2 X. So the local law 
holds for this block from Theorem 2.10 of [7] by absorbing the nonrandom matrix into the fixed 
vector v or w (note that (7.25) is required in the conditions of Theorem 2.10 of [7]). A second 
block matrix of G rn is G m T with T = D _1 / 2 UiX(I — Px T uj)- From the end of the last section 
and (7.32) we see that G m = (TT* — zl)~ l due to (I — P X Tur) is a projection matrix so that this 
block is just one of the block in (3.3) in [15]. 


7.1.1 Proving (7.19) for general distributions 


We next prove (7.19) for general distributions by fixing Y first since X and Y are independent (the 
dominated convergence theorem then ensures (7.19)). However to simplify notations we drop the 
statements about conditioning on Y as well as the event S £. In other words, whenever we come 
across expectations they should be understood as conditional expectations and involve For 

example, (7.38) below should be understood as follows 


E 


F ab (X,z)\^I(S^) 



< (n 24 < 5 T) 2 L 


In order to prove Theorem 7.1, it suffices to show that for any deterministic orthogonal matrices 
Vi and V 2 , we have 

||Vi(G(z) — F(z))V^||oo -< T, (7.33) 

for all 2 £ E + . We define S' to be a e-net of E(t , n) with e = n ~ 10 and the cardinality of S , |Sj, not 
bigger than n 30 . Note that the function D 1 / 2 (G( 2 :) — F(z))D 1,/2 is Lipschitz continuous with respect 
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to the operation norm in E+ and the Lipschitz constant is Mn 2 ||XX*|| + Mn 2 ||l/A m i n (U 2 X)||. By 
(7.3) it then suffices to focus on S to prove Theorem 7.1 by Lemma 3. 

Following [7] the main idea of the proof is an induction argument from bigger imaginary parts 
to smaller imaginary parts. Set <5 to be a sufficient small positive constant such that n 24<5 'F <C 1. 
For any given rj > —, we define a sequence of numbers % < Vi — r ? 2 --- < Vl with 

m = m lS , (l = 0, 1), r)L = 1, (7.34) 


where 

L = L(rj) = max{Z G N : r]n lS < n~ S }. 

One can see that L < < 5 _1 + 1 by the definition. From now on we will work on the net S containing 
the points E + irji £ S, l = 0,..., L. Moreover define S k = {z £ S : Qz > n~ Sk } and sequence of 
properties 

5 fc = {||V 1 (G(z)-F(z))Vi ’|| 00 X 1, for any z £ S k } (7.35) 

Ck = {||Vi(G(z) — F(z))V ^|| 00 -< n 24 < 5 T, for any z G S k }. (7.36) 

We start the induction by considering property Bq. We claim that the property B$ holds. 
Indeed we conclude from (7.11) and(7.27) that 

||V 1 (G(jaf) - F(«))Vf I!* ^ ||G m (z)|| + ||F(z)|| + 1^1, 

as claimed. Moreover it’s easy to see that property C k implies property B k by the choice of <5 such 
that n 245 'k <C 1. We next prove that property B k _\ implies property C k for any 1 < k < <5 _1 . If 
this is true then the induction is complete and (7.33) holds for all z £ S. 

To this end, we calculate the higher moments of the following function 

F afe (X, z) = ((JiG {z)il) ab - (JiF^J^ab) T n (X), (7.37) 

where Ji, J 2 G L = {1, A, V}, A is defined in (7.51) below and V is any deterministic orthogonal 
matrix. Lemma 4 below, Markov’s inequality and (7.18) then ensure that property B k _\ implies 
property C k . 

Lemma 4. Let q be a positive constant and k < <5 -1 . Suppose that property B k _i in (7.35) holds. 
Then 

E[\F ab (X,z)\ 2 ^ < (n 245 T) 2,? , (7.38) 

for all 1 < a,b < n + p and z £ S k . 

The proof will be complete if we prove Lemma 4. Before proceeding, we present a simple but 
frequently used lemma which can help us transfer the partial order of two random variables to the 
partial order of the expectations. 
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Lemma 5. Let Q be a random, variable satisfying C -< v where positive v may be random or 
deterministic. Suppose |C| < n M ° for some positive constant Mq. Then 


EC -< {Eu + n M °~ D ) 


(7.39) 


where D is a sufficiently large positive constant. 


Proof. Since (, P v there exists a sufficiently small positive e and sufficiently large D so that 


P{( > n e u) < n~ D . 


Define the event A e = {C < n e u}. Write 


|EC| = ECI(A) + EC I(A c e ) < n e Eu + n M °P{A c e ) < n e Eu + n M °~ D 


□ 


We now claim that 


E(|F ab (X°,z)| 2,? ) < (n 245 *) 2q , 


(7.40) 
(X^) = x. Gauss 


if X in Lemma 4 is replaced by the corresponding Gaussian random matrix X° 


consisting of Gaussian random variables with mean zero and variance one. Indeed, one can see 
that li^XV )! 29 -< 2q from the paragraph containing (7.32). To apply (7.39) to conclude the 
claim we need |i ? a ;,(X 0 , z)\ < n M °, which follows immediately from the first estimate in (7.28) and 
the second estimate in (7.30). 

7.1.2 Proving Lemma 4 by the interpolation method 

We next finish Lemma 4 for the general distributions by the interpolation method developed by 
[15]. To this end we need to define the interpolation matrix X 4 between X 1 = ( Xj ti ) = X and X°. 
For 1 < i < p and 1 < /r < n, denote the distribution function of the random variables Xf by Ff 1 ^ 
for u = 0,1. For t G [0,1], we define the interpolated distribution function by 




(7.41) 


Define the interpolation matrix X t = (X| ) with being the distribution of X* and } are 
independent for i, p,. We furthermore introduce the matrix 



which differs from X* at the (i, p) position only. We also define G t (z) = G(X*,z) and G^ (z) = 
G(X*'.^, z), the analogues of G(z) defined above (7.9), by replacing the random matrix X in G(z) 
with X 4 and X|^ respectively. 


We now need the following interpolation formula and one may see Lemma 6.9 of [15]. 
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Lemma 6. For any function F : M pxri —y C, we have 


r-\ p n r 

E FiX. 1 ) - EF(X°) = dt EE 

•''0 i =1 U=1 


—I 




EF(x--)-EF(x;^ ) ; 


(7.43) 


To handle the right hand side of (7.43) we establish the following Lemma. 


Lemma 7. Fix an positive integer q and k < <5 -1 . Suppose that property B ^_i holds. Then there 
exists some function g a b( z) suc/i t/iat for t E [0, l],u£ {0,1}, z G 

p n 


i=l /x=l 


FCl^CXlW"’^)! 29 )) - E I^( X (£)^)| 2<Z =0((n 24 ^) 29 + ||EL(X t ,z)|| 00 ), (7.44) 


with the matrix L(X t ,^r) = ^l-F^X 4 , 2 : 


l<a,6<n+p 

Lemma 7 immediately implies that for z £ Sk 

p n 


EE e(i^(x;^,^)I 2 '') -E(|F ot (x;;J, 2 )| 2 '>) 


t.A'P 


= 0((n 24<5 T) 2 * + ||EL(X*, zJlloc). (7.45) 


*=1 /i=l 

To apply the above results we need the following Gronnwall’s inequality. 

Lemma 8. Suppose that (3 ft) is nonnegative and continuous and ufb) is continuous. If for any 
iGl, aft) is nondecreasing and u(t) satisfies the following equality 

u(t) < aft) + f (3(s)u(s)ds, 

Jo 


then 


uft) < aft) ex p ^ J (5(s)ds' S j. 


To apply Gronnwall’s inequality it is observed that 
d 


max E\F ab (X t ,z)\ 2ci ') < max ^-E\F ab fX\ z)\ 2q . 
dt V l<s,t<n+p ) 1 <s,t<n+p dt 

From (7.43) and (7.45) we see that 

dE\F ab (^,z)\ 2q = 0((n 24 s^ 2 q + || EL(X *, z)^), 

if F in (7.43) is taken as \F ab (-, z)\ 2q . Gronnwall’s inequality and (7.40) imply that 

e>E|F ab (X t ,z)| 2 9 < M ( n us^ 2 q + M ( max E |i 7 aii (x 0 , 2 )| 2(?N ) < M(n 245 'k) 2<? 

Ot V l<s,t<n-\-p J 

This, together with Lemma 6 and (7.40), implies that Lemma 4 holds. Similarly for future use we 
would point out that if n 24<5 T in (7.44) is replaced by n^T 2 and (7.40) is strengthened to 


E(|F a6 (X°,z)| 2<? ) < (n^ 2 ) 2q 


(7.46) 
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then 


(7.47) 


E(|F ab (X,z)| 2 ") < {n s ^ 2 ) 2q , 

if the real part of z is outside the support. 

What remains is to prove Lemma 7 and we below consider the case u = 1 only (u = 0 is similar). 
We first develop a crude bound below so that we may use property B k -\ in (7.35), which is the 
assumption of Lemma 7. 

Lemma 9. Suppose that property B k _i holds. Then for any unit vector v and w 

( v , (G(z) — F(;z))w) = 0^{n 2S ) 


for all z E S k . 

Proof. Recall the definition of rji in (7.34). Note that zi = E + iry E Sk for 1=1,2,...,L when 
z = E + irj E Sk- 1 - Hence (7.35) ensures that 

%G VV (E + iry) -< |v| 2 + 3f(v, U(E + irji)v) -< |v| 2 , 


where the last -< follows from (7.27). We conclude the proof by Lemma 10 below. 


□ 


Lemma 10. For any z E S and x, y E M p+n , we have 

Hv) 

(9 : G'xx(7? + iru) + QG yy (E + iry)) + |x||y|. 

l=i 

Proof. The proof of this lemma follows that of Lemma 6.12 in [15] closely. It follows from (7.11) 
and (7.27) that 


<x,(G(*)-F(*))yH 


m 


E 


(x,H 2 Vfc ) 2 
|Afe - z\ 


^ (y,A 2 v k ) 2 

hi i Afc_z i 


x l|y 


We evaluate the first term below and the second term can be handled similarly. We introduce the 
indices subsets 

Ci = {k : r)i -1 < |A fc - E\ < ry} , {1 = 0, 1,..., L + 1), 


where rj-\ = 0 and t)l+i = oo so that we can rewrite the first term as follows. 

( X ) A 2 "V k ) 2 _ y^ y* (x, A 2 V k )~ 

A—* |Afc — z\ A—* z—j |A^. — z\ 

k=l 1 K 1 1=0 k&Ci 1 K 1 

Consider the inner sum for l E {1,2 ,..., L}, 

y (x,H 2 Vfc ) 2 ^ y (x, A 2 V k ) 2 r]l Q y (x, A 2 V k ) 2 ry 

k ~h {Xk ~ E)2 " k^- E ^ 2 +r >i i 

< 2—ZG X x{E + iry-i) < 2 n 5 ZG xx (E + iry^). (7.48) 

m -1 
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Combining with the fact that yQG xx (E + iy ) is nondecreasing function of y, we have 


V < 2 n 2S %G xx (E + iriivi). 

I z 


kGUt 


Next, we consider the cases 1=0 and 1=L+1. 


£ 


(x, A 2 Vk)' 


< 


£ 


fceC 0 

(x, A 2 v k ) 2 


|A fc - z 


|A k - z | 

£ 


^£ 


(x, A 2 v k ) 2 q 

(A* - EY + t f 


< 2%G XX (E + vq) < 2n S QG xx (E + i m ). 


fceCo 

(x, A 2 v fc ) 2 |A fc - E|t/ l 


A 


£ 


(x,A 2 v k ) 2 q L 


/cECl+i 

where we also use (7.3). 


, eC(+1 (Afc - E) 2 + r]\ kk +1 ^-E)^r,l 


< XsG xx (E + iq L ), 


□ 


It is observed that Lemma 9 holds for the interpolation random matrix X f as well because 
from (7.41) one can see that the entries of X f are independent random variables with mean zero, 
variance one and finite moment. Recall the definitions of Jj, i = 1, 2 in (7.37). It follows that 


|Ji(G { (z) — F(z))J^ ||oo -< n 2S , for z in S k . 


(7.49) 


Below we further generalize it so that (7.49) still holds even if any entry Xf of G*(z) is replaced 
by any other random variable of size not bigger than n^ 1 / 2 . From (7.42) write 


X '- - = (Ai - A 2 )e,eJ-. 




This, together with (7.4), yields that 


H(X(’ A 1) - H(X(’ A ") = A A] . , 

A {iH)> A ( ifi )) (in) ’ 

where H(Xj’ A l) is obtained from H(X) in (7.4) with X replaced by X('.\ and 


rt,X 2 \ 


Xl — X2 


k (*M) 


= A(e p+p e' A + A 1 e;e 2 +p ), A = ( UfD- 1 ^ jjT 0 


(7.50) 


(7.51) 


where and in the following e p+p is always (n + p) x 1 and e,; is p x 1. Applying the formula 
A -1 — B 1 = A _1 (B — A)B _1 repeatedly we further obtain the following resolvent formula for 
any H G N , 

G !;i) = G <$ + £(- 1 )‘ G m( A W A “ G m) , ‘ + (-1 ) H+1 C^(a^G^ H+1 , (7.52) 

h =1 

recalling the definition of Gj)-^ below (7.42). Here and below we drop the variable z when there is 
no confusion but one should keep in mind that z e S k . 

Lemma 11. Suppose that X is a random variable and satisfies |A| -< n -1 / 2 . Then 

l|J 1 (G(;i ) -F)Jl’|| 00 An 26 . (7.53) 
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Proof. Recall (7.27) 


IF11 -< 1. 


It is easy to see that 

which implies that 


| A || < M, 


Ki„)ll < MX. 


(7.54) 


We next apply (7.52) with Ai = A, H = 11 and A 2 = so that G*7^ = G*. We conclude from 
(7.49) that 

||JiG*’ A 2 J| + HG^jJll -< n 2S . 

11 1 (w. M 211 

Note that |Ai — A 2 1 -S n -1 / 2 . Similar to the first inequality in (7.29), Gj’ A l can be bounded by the 


(iii) 


a 


imaginary part of z, i.e. G*'. A ^ = 0_<(n). Summarizing the above we conclude Lemma 11 
In order to simplify the notations, recalling (7.37) we define 

fmW = IMIX^)!* = (^(x^JJ^Ix^,))'. 

where we omit some parameters. By Lemma 11 and (7.52) one can easily get the following Lemma. 

Lemma 12. Suppose that A is a random variable and satisfies |A| -< n^ 1 / 2 . Then for any fixed 
integer k we have 


l/£!)(A)| n 2S ^ +k \ 

where /^(A) denotes the kth derivative of f^( A) with respect to A. 
From Taylor’s expansion and (7.55) when |A| -< n -1 / 2 we have 

89 \ k 


(7.55) 


k =0 


(7.56) 


It follows from Lemma 12 and (7.39) that 


E|F a6 (X^)| 2 « - E|F afc (X^° ) )| 2 * = E/ m (X^) - E/ (i/l) (0) 

1 89 1 

= 2 ( m + „-p) E C(°) + X *5 E 4 ) ) (0)E(V.L‘ + CM**), 

v k =4 


(7.57) 


where we use E(X^) /C = 0, k = 1, 3. To show (7.44), it suffices to prove that 


p n 


"-‘TE E/g^O) = 0((n 245 *) 29 + ||E|F(X*)| 


2 q\ 


(7.58) 


2—1 fl= 1 
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for k=4,...,8q. At this moment we would like to point out that E\g a b(X t ^)\ 2q in (7.44) equals 


EIF^X^)! 2 ^ 


1 


, —E/, (2 \(0). 

2 (m + n — p) ^ 

We will not prove (7.58) directly. Instead we will prove the following claim in order to obtain 
a self-consistent estimation of X 4 . We claim that if 

p n 

= 0((n 24S ^) 2q + \\E\F(X t )\ 2q \\ 0O ), (7.59) 

i =1 p,=l 

is true for k=4,...,16q, then (7.58) holds for k=4,...,8q. Indeed, in order to apply (7.59) to prove 
(7.58) we denote and X| by f and X respectively for simplicity. Similar to (7.57), by (7.55) 
we have 


4 HT Y~k 

E/®(0) = E/«(X) - Y E/ (i+fc) (0)=^- + 0^(n l/2 ~ 1 / 2 - 8q+mq ). 

k =1 


(7.60) 


It follows from (7.60) that 


EX* 


E/( fc )(0) = Ef( k \X) — Y Ef (k+kl \0)—~+ 0^(n k/2 - 1/2 - Sq+mq ) 


fcl>l 

fc+fci < 16 g 


E f {k \X)- Y ^f {k+kl) (X) 


fcl>l 

fc+fci < 16 g 


k \! 
EX fcl 

xx 


+ Yj Ef^ k+k l+ fc2 )(0) ^^ 1 -| Q^(n fc/ ' 2 ~ 1 / 2 ~ 8g+40(5g ) 


fci ,k 2 >l 
fc +fc 1 + fc 2 5:16 q 


fci! k 2 \ 


IGq-k TRVk- 

X (-u r X Ef( k+ ^ ki \ X ) n ^ n k/2-l/2-8q+mqy 

r= 0 fci ,k2,...,k r >l ■ 7 ' 


<16<? 

This, together with (7.22) and the definition of T in (7.8), implies (7.58) immediately, as claimed. 
It then suffices to prove (7.59). Recall that 




Qk 


K*h 2q 


d(Xt' k 


(7.61) 


Z/R/ 


t X ■ 

where T s t(-) is given in (7.37). Since X* = X^^ is the only matrix we focus on we below use 
X = (Xj Al ) instead of X 4 = (X 4 ^) to simplify notation because the entries of both of them have 
bounded higher moments. To prove (7.59) we need to study (7.61). 


7.1.3 Estimate of higher order derivatives (7.61) in (7.59) 

We first look at the higher order derivatives of (JiF^J^),^ with respect to X,^. Noting that F(z) 
is a 3 x 3 block matrix we need to analyze the derivatives of (JiF wnu block by block. It turns 
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out that the higher order derivatives of (JiF(^) J 2 )ab are quite complicated even if we analyze them 
block by block. Fortunately, as will be seen, the exact expressions of the higher order derivatives 
of (JiF(z)J^) a fe are not important. Moreover we claim an important fact that the higher order 
derivatives of F(z) with respect to Xj M can be generated by some sum or products of (part of) 
common matrices Ui, U 2 , X, e^e^, e^ef , X, T(X) (we call these common matrices atoms). Indeed, 
recalling T(X) = (C/ 2 XX 1 U^)~ l simple calculations indicate that 

axx T 




3X 




= Xe M e' + e^e^X 


ar(x) 

5X iM 


= -T(X)(t/ 2 Xe M eft/ 2 T + U 2 e i e^ 1 'U^)T(X). (7.62) 


TttT\ 


It’s easy to see that the first derivative of each block of F(z) with respect to Xj^ can be constructed 
by sum or products of these atoms. Assuming that the fcth derivative of each block of F(z) is 
constructed by these atoms we find that the (k + l)th derivative of each block of F(z) is also 
constructed by these atoms by (7.62). Based on the above fact we can describe the higher order 
derivatives of (JiF(z)J^)^ easier. By dropping and from the atoms we define the set 


Q(k) = {The matrices constructed from sum or product of (part of) Ui, U 2 , X, X, T(X)}. 

(7.63) 

Any C'tli order derivative of each block of F(z) with respect to X 2// belongs to some product(s) 
between some matrices in Q(k) and e,;e^ or e^ef. 

Lemma 3 and (7.3) imply that ||T(X)|| < M and ||XX*|| < M with high probability. Recalling 
(7.26), in view of the arguments above we conclude that for any Q E Q(k), 


Qll -< 1 


(7.64) 


and the cardinality of Q(k) satisfies \Q(k)\ < M(k), where M(k) is a constant depending on k. 
Moreover, for the function 7^(X), if 7^(X) is differentiated, then by simple and tedious calculations, 
from the definition of the smooth cutoff function, (7.15) and (7.16) we have 


and 


D J 

l(L 


T n (X) 


A 0 



(7.65) 


(7.66) 


for any positive integer 1 and sufficient large n. The above properties about 7^(X) and the matrices 
belonging to Q(k) are enough for our proof below and we don’t need to investigate the precise 
expression. 

We next look at the higher order derivatives of (JiG W 2 )a6 with respect to Xj M . To charac¬ 
terize its higher order derivative conveniently we define group g of size k to be the set of paired 
indices: 

g = {ai&i, CI 2 & 2 , ■ ■ ■ ,afc+i^fc+i}, 
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where each of {aj. bj,j = 1, • • • , k+ 1} equals one of four letters s, t, i, (g+p ). Here we would remind 
readers that the size of group g is defined to be k instead of (k+1) in order to simplify the argument 
below. Denote the size of the group g by k = k(g) and introduce the set ©& = {g : k(g) = k} 
consisting of groups of size k. Moreover, we require each group in ©& to satisfy three conditions 
specified below: 

(i) ai = a and bk+i = b. 


(ii) For l G [2, k + 1] we have ai G {i, g + p} and fy_i G {i, g + p}. 


(iii) For k G [1,/j] we have bi-icg G {i(g + p), {g + p)i}. 

As will be seen, groups g are connected with the high order derivatives of (JiG (z)Jj) ab - 

7 

Moreover write F(z) = ^ Fj (z) where each F j(z) corresponds to a non-zero block of F(z). As 

3 = 1 

before, to characterize the higher order derivative of each block conveniently we define groups g^) 
of size k to be the set of paired indices: 


9^ { Uj i hyi, u j,2 2 ? , i)bj(k+i)\ i 

where each Sj m and tj m equals s,t,i,g. Moreover introduce the set 05 jk = {g^ : k(g^) = k} 
consisting of groups of size k. We require each group in 05 jk to satisfy conditions: 

(i) aj i = a and b j{k+l) = b. 

(ii) For l G [2, k + 1] we have a 3 i G {i, g} and bjn\) G {i, g}. 

(iii) For k G [1,/c] we have bjg^aji G {ig, gi}. 

As will be seen groups g^' 1 are linked to the high order derivatives of (JiF(z)J l) ab . 

We below associate a random variable B a ^j tl j(g, g^\ ■ ■ ■ ,g^) with each group g,g^\j = 
1, • • • ,7. When k(g) = k(^) = 0 we define 

B a ,b,iA9,9 {1 \--- ,g (7) )) = (JiG(z)Ji’U - (JiF(z)J^) ab . 

When k(g) > 1 and k(g^) > 1, define 


B, 


R- 2 ,... ,k,72u,... ,7fc+i 


(5,5 


( 1 ) 


? (7) ) = c, 


a,b,i,fj,, R 2 ,... ,fc,72n,... , 71 - 4-1 


(5,5 


( 1 ) 




with 


7 

~ J 2 )(a jfc+ i6 jfe+ i)) 

J=1 


(7.67) 


C , a,6,i,/i,R 2 .... ,fc,72.il,... ,7/c+i (3)5^ \ i9^)) (JlGA5)(ai(,i) (R2)(a2b2)"’(-^'fc)(afebfe) (-^-4GJ 2 )(a fc+1 6 fc+1 )) 

(7.68) 
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where Rj(2 < j < n) has the expression of Rj = A 4 GA 5 with A 4 E {1, A}, A 5 E {1, A r } and the 
non-zero block IZji belongs to Q(k ) in (7.63). Moreover the selection of 1 and A in A 4 and A 5 is sub¬ 
ject to the constraint that the total number of A and A T contained in r,, fc , 7 e n 7fe+1 ( g , g^\ ...,g^) 

is k. One should also notice that if k(g) = 1, the terms Rj will disappear. It follows from (7.64) 
that 

\\TZji\l ■< 1. (7.69) 


It is easy to see that 


dG 

dX, 


- -G(e M+p ef A + A T ejeT, )G, 


l[l 


(7.70) 


(one may see (7.50) for the derivative ). We first demonstrate how to apply the above definitions 
about groups gW and R a , 6 ,i,M,R 2 , k ,n lh ... t7k+1 {g, g (1 \ • ■•,fl ,(7) ) and hence write 

d k 


d{Xi 




JlG(z)J^ ) ab - (JiF(z)J^ )ab]7n(X 


(7.71) 


= (-!)* 


E 


B, 


■■■,g^)T n (X) + 0 ^( 0 ), 


ge& k ,gU) e & jk 

R,j,£=2,...,fc 

TZji,j=l,..7,l=l,...,k+l 


where the term 0^(0) comes from the derivative on 7^(X) by (7.65), (7.29) and (7.27). To simplify 
the notations, we furthermore omit R 2 ...,fc)^n,..., 7 fc+i, < 7 ^, • ••, </'* in the sequel and write 


Ba,b,i, 11 (g) — ■Sa,6,t,/i,R 2 ,... .fc^n,..., 7 fc+i (Sj 9 ^ ^ E 

C a ,b,i,fi(g ) = C , a,6,i,/i,R2 i ... ,72. lls ... ,7fc+i (l?) \---i9 ^ ^)> 


(7.72) 

(7.73) 


(here one should notice that the sizes of 5 and g^ are the same according to definition (7.67)). 
More generally we furthermore have 


d k 

d(X tli ) k 


(|T a f,(X)| 29 ) = (—1) A 


E 

^2 r (kr~\-kr)=k 


k\ 


Id,. WW 


(7.74) 


x n( e 

r=l g r £0 kr u& jkr 


E 

9re®i 
R.j,i=2,...,fc 
jl ,j=l,..7,l=l,...,k+l 


u©-r 


B a ,b,i,/i(gr)Ba : b,i,ii(gr)T n (X)) + 0^(0), 


where g r E U ©jfc r means that the groups associated with the derivatives of G(z) belong to 
&k r and the groups associated with the derivatives of F(z) belong to &jk T ■ In view of (7.74) and 
(7.61) to prove (7.59) it then suffices to show that 
p n r q 


n 


—k/2 


XX^ 


Q Bab ip (g r )Ba tbi i t p(g r ) T 2q (x) 


0({n 245 ^) 2q + ||E|F(X)| 2 1 00 ), (7.75) 


i —1 [i=l 


_r =1 


for 4 < k < 16 q and groups g r E & kr U & jkr , g r E ©^ U satisfying ^2 r (k(g r ) + fe(g r )) = fc. To 
simplify notations, we drop complex conjugates (which will complicate the notations but the proof 
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is the same) from the left hand side of (7.75). Without loss of generality, suppose there are (2q-l) 
terms such that k(g r ) = 0 and denote each of them by go. (7.75) reduces to 


n 


—k/2 


EE' 

i— 1 fi= 1 


B a ,b,iA9o) 2q - 1 n Ba,b,iA9r)T^(X.) 


r=1 


0((n 246 'k) 2q + ||E|F(X)| 29 || 00 ), (7.76) 


for 4 < k < 16 q and groups g r G &k r U ®jk r satisfying k(g r ) = k and k(go) = 0. 
To estimate the left hand of (7.76), we introduce the notations 


n i = H li + 'H abi , 'H li = \(J 1 GA) ai \ + \(A T GjT)ib\, U abi = ]T (|(Jift) ai | + 1(7^2)»|), 

TleQ(k) 

= K J iG) a ( M+p )| + |(Gjf , )( M+J ,) 6 |, r H ail = ^ (|(Ji77) a/i | + |(77J|’) /ia |, 

neQ(k) 

where the lower indices i and p at J\1Z and TZ respectively represent the index i, i+p—m, i + p, 
and p, p + p — n or p + p depending on which block we consider (or differentiate). By (7.53), (7.27) 
and (7.64) we have 

Ui + U^n 25 . (7.77) 

Moreover for g r G <5k r U &jk r , we similarly obtain from (7.53), (7.27), (7.64) and definition (7.67) 
that 


\B a ,b,iA9r)\ -< n 25(fe(ff)+1) , 

(recall k(g) = k(g^) from definition (7.67)). Likewise, for k(g) > 1, we have 

\B a ,b,iA9r)\ -< (U 2 + Ul)n 25 ^- l \ 

while k(g)=l, 

\Ba,b,iA9r)\ 


(7.78) 


(7.79) 


(7.80) 


When k < 21 — 2 there must exist at least 2 g r ’s satisfying k(g r ) = 1 because X^r=i k(g r ) = k < 
21 — 2. It follows from (7.78) and (7.80) that 


i 

\B a ,b,iA9o) 2q - 1 n B aA iA9r)\ -< / (Hl ^r'(X)(/(l >21- 1 m 2 +nl) 


r— 1 


+ I(k<2l-2)n 2 Hl). 


(7.81) 


Recalling the notation A in (7.51) we have || A A 7 || < M. In view of (7.64) it is easy to see that 

p n 


^ n &+ 

M= 1 


(7.82) 


i =1 
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(7.83) 


E n lbi + E 

i or a or b a or /x 

where i or a or b means the summation over either i or a or b and 


9(JGJ *) aa + V 


nr] 


with J G C defined in (7.37). This implies that 


E n i + E x n< &+ n $>- 
1 


(7.84) 


i=l 


From (7.22) and (7.25) 


2 _ 9(JFJ*) aa + 3f(J(G - F)J*) aa + r? + 9f(J(G - F)J* 

' n ^ 


nr] 


nr] 


Recalling the definition of \F in (7.8) we conclude that 

tl^ViV + FaaiX)). (7.85) 

By (7.13), (7.26), (7.28), (7.30), the definition of 7^(X) and definition (7.67) we have 

\B a ,b,iA9o)Ba,b,iA9r)T n (X.)\ < n M °. (7.86) 

From (7.81), (7.84), (7.86) and (7.39) the left hand side of (7.76) is bounded in absolute value by 


Set 


n _ k/2+ 2 n35{k+ D Eir 2 ,-I (X ) r /(fc > 2l _ + ^ + 1{k < 2/ _ 2)( ^4 + 0 4 } 


P 2 ? _ rfq , p2q 

r l ~ r aa w r ba . 


+ n 


-D 


(7.87) 


We conclude from (7.85)-(7.86) and (7.39) that the left hand side of (7.76) is bounded in absolute 
value by 


n 35(fc+0 ^k-2 Ei r29-i( X ) + ® fc - 3 EF^ -,+1 (X)) + n“ D , if jfc > 21 - 1, 
n 35(fc+0 (V Ej p2q-*( X ) + ^fc- 2 Ei71 2q-Z+ 2 ( X )j + jf k < 21 - 2. 

Since l < k (7.88) is further bounded by 

(n 245 ^) fc - 2 EF 1 29_i (X) + (n 24 ^) fc - 3 EF 1 2? "' +1 (X) + n~ D , if k > 21 - 1, 
(n 245 \F) fc Ei ?2<?-i (X) + (n 24<5 ^) fc - 2 EF 29_i+2 (X) + n" 15 , if k<2l-2. 


(7.88) 


(7.89) 


This ensures that the left hand side of (7.88) is bounded in absolute value by 

(n 246 ^) l KF 2q ~ l (X) + (ra 245 tf) ,_1 E.F 1 25-,+1 (X) + {n 2iS ^) l ~ 2 1 {l > 3)EF 2g “ /+2 (X) + n ~ D , (7.90) 
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where we use the facts that k > l + 2 when k > 4 and k > 21 — 1 and that k > l and l > 3 
when k <21 — 2 and > 4. When l > 2, (7.76) follows from (7.90), the facts that (E\X\ r ) l / r is a 
nondecreasing function of r and that n~ D < (n 245 \F) 2l? for sufficiently large D. For example 

2q 2q-l+2 1-2 

(ra 24 *®)* _2 Ei?f ,_ * +2 (X) < (EjV^+^X)) 29- ^ 2 ) 25 ([n 2A6 ^) 2q ^j 2q (7.91) 

2q-l+2+l-2 

< (e(V 29 (X)) + (n 245 T) 2<? ) 2q 

When l = 1, the first term can be handled similarly and the second term directly implies (7.76). 
Thus we have proved (7.19) in Theorem 7.1. 


7.2 Local law (7.20) 

This subsection is to prove (7.20) in Theorem 7.1, i.e. 


1 


\m n (z)-m(z)\ < —• 

nr] 


(7.92) 


As pointed out in the paragraph containing (7.32), (7.92) holds when the underlying distribution 
of Xij of X is the standard Gaussian distribution. Moreover, we need to use the interpolation 
method to prove (7.20) for the general distributions as in proving (7.19). However we do not need 
induction on the imaginary part of £ unlike before due to existence of (7.19). 

In order to prove (7.92) it suffices to show that 


\m n i z ) - m(z)\T n (X) -< —. 

nr] 

As in (7.37) we introduce the notation F 2q (X,z ) as follows 

1 m 

F 2q (X,z) = \m (z) — m(z)\ 2q T 2q (X) = \ - £ G kk (z) - m(z)\ 2q T 2q {X). 

k 

Checking on Lemmas 4, 6 , 7, (7.45) and (7.59) in the last section we only need to show 

p n 


(7.93) 


—fc /2 ' 


[(^) fc J F 2 «(X,z)] =0((n 5 ^ 2 ) 2q + \\F 2q (X,z)\\ 00 ), k> 4 (7.94) 

i= 1 p=l 

where <5 is sufficiently small so that n s is smaller than n £ before (7.93) due to the definition of 
the partial order. Applying the definition of B a ^in the previous section with Ji = J 2 = 1 and 
a = b = k, it suffices to show that 


n 


p n 2 q 

EE II 


^ m 

/ , Bk k,i,p(9h) 

m ' 


= 0{{n 5 ^ 2 ) 2q + \\EF 2q (X, z)\\ c 


(7.95) 


i =1 p=l h= 1 L fc=l 

Notice that (7.19) holds uniformly for any unit determinant vectors v , w and z G S. This, together 
with (7.85), implies that 


b 2 < 'F 2 . 
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We then conclude from (7.81) and (7.84) that 

- m 

— y2 B k,k,iAg h ) ^ ^ 2 ’ for 9{w) > 1- (7.96) 

k=l 

For future use, recalling (7.68) and (7.73) we also obtain from (7.83) and (7.96) 

- m 

-Y^C^ASh) for g(w)> 1. (7.97) 

k =l 

As in (7.81) we then have 

. m l 1 m 

i-e b ^(9o) 2, -'ii-e b ‘.m^(*)i -< ^(x.*)**. 

fc=l r=l fc=l 

(7.95) and hence (7.20) then follow via (7.39) and an argument similar to (7.91). 


7.3 Convergence rate on the right edge and universality 
7.3.1 Convergence rate on the right edge 

The aim of this subsection is to prove the following Lemma. 

Lemma 13. Denote by Ai the largest eigenvalue of A in (7.1). Under conditions of Theorem 2.1, 

_ 2 

Ai - fi m = OA n 3 )- 

Proof. The approach is similar to that in [ 8 ], ([19]) and [4], Checking on the proof of Theorem 4.1 
in [4] carefully, we find that (ii) in Theorem 4.1 in [4] and hence the lower bound of Ai of Lemma 
13 still hold in our case because of (7.23) and (7.32). It then suffices to prove that for any small 
positive constant t 

Ai <fi m + n~ 2/3+T (7.98) 

holds with high probability. Note that by (7.3) and Lemma 3 


All < M 


(7.99) 


with high probability for sufficient large positive constant M (here one should notice that ||D" _1 || < 
M with high probability due to (6.2) and (6.3)). For a suitably small r, set z = E + ig and 
k = \E — fi m | where E e [fi m + n~ 2 ^ 3+r , (i m + r -1 ] and rj = n _1//2_T / 4 Av 1 ' /4 . By Lemma 2.3 of [4], 
we have 

5m x ; V < —, (7.100) 

\J k + rj ng 


where <C means much less than. 
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We furthermore claim that with high probability 


I m — m\ <C — (7.101) 

nr) 

Indeed, (7.101) holds when X reduces to X° due to (4.6) in [4], (7.100) and (7.32). For the general 
distributions, (7.101) follows from (7.94) and (7.47). It follows from (7.100) and (7.101) that with 
high probability 

3(m n ) < — • 

nr) 

Moreover note that with high probability 

I(E — r) < A,; < E + rj) < Mnr)2s(m n ) <C 1. 

i 

As a consequence there is no eigenvalue in [E — r/,E + r)\ with high probability. This, together with 
(7.99), ensures (7.98). 

□ 


7.3.2 Universality 

The aim of this subsection is to prove (ii) of Theorem 2.1. By (6.12) and (6.13), it suffices to 
prove edge universality at the rightmost edge of the support Am- In other words, the asymptotic 
distribution of Ai is not affected by the distribution of the entries of X under the 3rd moment 
matching condition. Similar to theorem 6.4 of [ 8 ], we first show the following green function 
comparison theorem. 

Theorem 7.2. There exists £o > 0. For any e < £o, set i) = n~ 2 / 3_e , E\, E 2 6 K. with E\ < £2 
and 

\Ei - fi m \,\E 2 - Ami < n _2/3+e . 

Suppose that K : M —> M is a smooth function with bounded derivatives up to fifth order. Then 
there exists a constant > 0 such that for large enough n 
pE2 rE 2 

|E K(n / Qm^i(x + irj)dx) — EK(n / 3'm x o(x + ir))dx)\ < n~^, (7.102) 

JEx JEj 

(see Definition 1 or (2.8) for X 1 and X°J. 


Proof. Unlike [15], [ 8 ] and [3] we use the interpolation method (7.43), which is succinct and powerful 
when proving green function comparison theorem. In view of (7.15) and (7.16) we have 
[‘E 2 rE2 


| EK(n 
EK(n 



9m x 1 (x + irj)dx) — E K{n / 3m x o (x + irfidx)\ = 

JEi 

fE 2 


7rru. x i 


2 

(x + ir))T n (X. 1 )dx) — EK(n / ^^r X o(x + ir))T n (X°)dx) 

Je 1 
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+ 0(n 3 ). 

(7.103) 




Applying (7.43) with T(X) = K(n 


3mx(a: + irj)T n (X)) we only need to bound the following 


m p 

EE 


i =1 /i=l 




(7.104) 


where 


r *#2 




= K(n / 9m t> x? (x + irj)T n (X^)dx), u = 0,1 


'Ei x (irt 


As in (7.56) and (7.57), we use Taylor’s expansion up to order five to expand two functions 
g(X-^), u = 0,1 at the point 0. Then take the difference of the Taylor’s expansions of ry(A“j, u = 
0,1. By the 3rd moments matching condition it then suffices to bound the fourth derivative 


yy M r max\K^ r \x)\E ( n 


r= 1 ,-.,k r gn_j_ 

+.. + fc7>=4 


2=1 


r*^2 


El 


(*m) 


t,0 


da; , 


(7.105) 


and the fifth derivative corresponding to the remainder of integral form 
5 r / r E 2 


—^ ^ M r max|X^(x)|E JJ | n f 

v 71 r=1 fc 1 ,..,fc r6 N + x »=i \ 175 


+ • . + ^r—4 


Ei 


where M r is a constant depending on r only, rn' k ^ Q (•) denotes the Ljth derivative with respect to 


^iexf (* + iv)T n {X’“" 


tfixv 

‘ 7,i 


X r 

(W 


dx 


(7.106) 


(»**) 


t, 0 X? 


Xfp and 0 < 0 < 1. Here we ignore the terms involving the derivatives of 7^(X^ IM ) due to (7.15), 
(7.16) and (7.28). 

To investigate (7.105) and (7.106) we claim that it suffices to prove that 


n 


rE2 


'Ei 


™ ( E> b+w.wz?) 

-y ZfJ, \ r 1 J 

(vO 




dx^j -< (n3 +e T 2 ), 


(7.107) 


where k > 1. Indeed, if (7.107) holds then (7.107) still holds if X- fl is replaced by OX}^ by checking 
on the argument of (7.107). We then conclude that the facts that (7.105) -< (n 3 +e 'k 2 ) and that 
(7.106) -< (n "2 + 3 + e T 2 ) follow from Lemma 5, (7.28) and an application of (7.56). 

By (7.68) and (7.97) we have for k > 1 


X,. 

(»w 




which implies that (7.107) -< (n 3 +e \k 2 ). Here we would point out that the derivatives rri' k ^ x] (•) 

x“’ 


(»/*) 


m 


are of the form 4- Ck,k,i,^{9h) from (7.67), (7.68), (7.71), (7.73), (7.94) and (7.95). By Lemma 

k= 1 

2.3 of [4] we have 

T 2 x —— = 0(n _ i +e/2 ). 

n^/v 
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Summarizing the above we have shown that 

rE 2 rE 2 j 

|E K{n / iJm x i(x + irj)dx) — Eif(n / 3m x o(i + ny)dx)| -<: n _ 3 +2e . 

«/ J Ei 

The proof is complete by choosing an appropriate e. □ 

In order to prove the TVacy-Widom law, we need to connect the probability P(Ai < E) with 
Theorem 7.2. 

_2 

By Lemma 13 we can fix E* -< n 3 such that it suffices to consider Ai < fi m + E*. Choosing 
|E — p, m \ -< n~ 3 , r] = n~ 3 —ye and / = ^n _ 3 _€ 5 then for some sufficiently small constant e > 0 and 
suHiciently large constant D, there exists a constant no(e, D) such that 

yo /»Am+-E7* 

EA'(— / 7rm x i(x + irj)dx) < P(Ai < E) < EA(— / 3m x i(x + irj)dx) + re -15 , 

^ ^ ie+i 

(7.108) 

where n > no(e,D ) and K is a smooth cutoff function satisfying the condition of K in Theorem 
7.2. We omit the proof of (7.108) because it is a standard procedure and one can refer to [ 8 ] or 
Corollary 5.1 of [4] for instance. Combining (7.108) with Theorem 7.2 one can prove Tracy-Widom’s 
law directly (see the proof of Theorem 1.3 of [3]). 
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